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ABSTRACT 


This  report  documents  the  research  performed  under  RADC 
Contract  No.  F30602-80-C-0 1 39  by  Northwestern  University  for 
developing  effective  methodologies  for  software  maintenance. 
This  contract  is  a  follow-on  to  Contract  No.  F30602-76-C-0397 
and  focuses  on  refining*  expanding  and  automating  software 
maintenance  concepts  and  techniques  developed  under  the  previous 
contr  act . 

During  this  contract  period*  significant  progress  was  made 
in  developing  techniques  for  specifying  and  realizing  software 
modification  proposals*  logical  ripple  effect  analysis  and 
module  revalidation  after  modification.  These  techniques  and 
the  performance  ripple  effect  analysis  technique  developed 
during  the  previous  contract  period  were  demonstrated  using  a 
DEC  UAX  11/780  computer.  In  addition*  a  number  of  software 
metrics  related  to  modifiability*  such  as  measures  For  logical 
and  performance  stability*  module  strength  and  coupling*  were 
developed.  Limited  experiments  for  validating  the  logical 
stability  measures  were  performed. 

In  this  report*  research  results  which  were  presented  in 
published  papers  are  summarized*  and  unfinished  and  unpublished 
work  is  presented  in  detail.  Publications*  and  technical 
personnel  related  to  this  project  are  also  summarized. 
Published  papers  presenting  the  work  supported  by  this  contract 
are  included  in  the  Appendix. 
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1.0 

INTRODUCTION 

<* 

k.- %• 

This  report  summarizes  the 

research 

performed  under 

a 

•< 

* 

Contract  No.  F30602-80-C-0139  by 

Northwestern  University  for 

V*  . 

Rome 

Air  Development  Center  during 

the  period 

from  April  23* 

1980 

to  November  30,  1982. 

•j 

The  original  objective  of  this  effort  was  to  conduct 
exploratory  development  of  techniques  for  the  design* 
implementation*  validation  and  evaluation  of  .i^ble  and 
maintainable  software  systems.  This  effort  was  i  ended  to  be 
a  follow-on  to  Contract  Ho.  F30602-76-C-0397*  'f-Metric 
Software"  CYAU80a*  80b*  80c]*  and  would  focus  on  refining* 
expanding  and  automating  software  maintenance  concepts  and 
techniques  developed  under  the  previous  contract. 

The  original  effort  was  planned  for  a  period  of  three 
years*  starting  April  23*  1980.  However*  because  of  some 
difficulty  in  continued  funding*  this  project  was  re-scoped  in 
September,  1981  and  had  a  lower  level  of  funding  starting  FY82. 
This  project  starting  October*  1981  was  re-directed  as  follows: 
to  complete  the  development  of  those  techniques  which  could  be 
completed  in  FY82*  and  to  complete  the  development  and  perform 
some  preliminary  validation  of  the  logical  stability  measures 
of  programs  for  measuring  the  resistance  of  the  programs  to 
logical  ripple  effect  due  to  modifications.  In  this  report* 


re search  results  which  ,.uve  Been  presented  in  precious  papers 
and  interim  technical  reports  are  summarized*  and  unfinished 
and  unpublished  work  is  presented  in  more  detail. 
Publications,  presentations  and  technical  personnel  related  to 
this  project  are  also  summarized. 

During  this  contract  period,  uie  have  made  significant 
progress  in  developing  techniques  for  specifying  and  realizing 
software  modification  proposals,  logical  ripple  effect  analysis 
and  module  revaluation  after  modification.  These  techniques 
and  the  performance  ripple  effect  analysis  technique  developed 
during  the  last  contract  period  have  been  demonstrated  using  a 
DEC  UAX  11/788  computer.  In  addition,  we  have  developed  a 
number  of  software  metrics  related  to  modifiability,  such  as 
measures  for  logical  and  performance  stability,  module  strength 
and  coupling.  Limited  experiments  for  validating  the  logical 
stability  measures  have  also  been  performed. 
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2.0  SOFTWARE  MAINTENANCE  PROCESS  AND  ASSOCIATED  QUALITY 

FACTORS 

The  software  maintenance  phase  is  the  most  time-consuming 
and  costly  part  of  the  software  life  cycle  EBOEH73D*  CZELK783* 

[  L.IEN803  .  However*  the  activities  carried  out  during  this 
phase  are  deeply  affected  by  the  process  of  software 
development*  since  the  purpose  of  software  maintenance  is  to 
modify  the  products  of  the  software  development  process. 

2 . 1  The  Act i vit ies  Of  Software  Maintenance 

Me  conceive  the  software  development  process  as  shown  in 
Figure  2.1.  The  first  activity  of  software  development  is  to 
study  the  application  area  and  define  the  requirements  for  a 
new  software  system  for  the  particular  application  problem. 
This  activity  involves  the  participation  of  representatives 
from  both  the  users  of  the  software  system  and  from  the 
software  development  organization.  The  second  activity  (or  set 
of  activities)  is  known  as  software  design.  This  activity  may 
involve  the  definition  of  several  intermediate  stages  during 
which  a  system  is  being  developed  to  meet  its  requirements. 
These  intermediate  stages  may  be  known  as*  for  example* 
architectural  design*  subsystem  design  and  module  design.  This 
activity  is  normally  carried  out  exclusively  by  members  of  the 
software  development  organization*  although  some  user 


Requirements 
Analys i s 


\ 

r 

Software 

Design 

\ 

- 

Coding 

1 

r 

Test ing 


development  process.  These  last  tuio  activities  are  exclusively 
carried  out  by  members  of  the  software  development 
or g an i zat i on *  although  they  are  frequently  performed  by  people 
who  were  not  involved  in  the  previous  activities  of  preparing 
software  requirements  and  design.  When  the  system  has  been 
tested  "successf u  1  ly",  it  is  released  to  the  users  and  enters 
an  "operational"  phase.  To  the  programmers  who  must  work  with 
the  system*  this  phase  is  more  commonly  known  as  the 
"maintenance"  phase. 

The  software  maintenance  phase  is  in  some  sense  a 
repetition  of  the  activities  of  the  software  development 
process.  Although  maintenance  objectives  include  improving 
software  performance*  correcting  errors*  transferring  software 
systems  to  new  computer  system  configurations  and  deleting 
obsolete  features*  the  most  frequent  objective  is  to  increase 
system  functionality  by  adding  new  features  or  by  improving 
existing  features.  Thus*  it  is  again  necessary  to  discuss  the 
requirements  for  the  software  system  with  the  users;  it  is 
again  necessary  to  perform  software  design;  and*  finally*  it 
is  again  necessary  to  perform  coding  and  testing.  However, 
there  is  one  fundamental  difference  in  these  activities  when 
they  are  carried  out  during  the  software  maintenance  phase: 
these  activities  must  now  be  carried  out  in  the  context  of  an 
existing*  operational  software  system.  It  is  important  for  the 
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software  maintenance  personnel  to  have  an  understanding  of  the 
process  which  was  used  to  develop  the  software  system.  They 
must  know  not  only  what  the  operational  system  is  and  does*  but 
also  how  and  why  it  does  so*  since  they  will  have  to  change  the 
requirements/  redesign  the  software/  modify  the  programs  and 
test  the  new  implementation  based  on  various  demands.  The 
traditional  approach  to  providing  information  to  assist  with 
these  tasks  is  by  means  of  "system  documentation".  Many 
techniques  have  been  developed  to  document  software  systems/ 
but  they  tend  to  be  incompatible  and  not  sufficiently 
comprehensive  to  describe  the  entire  software  development 
process  (e.g.  HIPO  CSTAY763).  We  have  developed  a  model  which 
is  suitable  for  describing  software  systems'  requirements/ 
designs  and  programs.  In  addition/  this  model  also  permits 
individual  software  requirements  to  be  traced  through  the 
intermediate  levels  of  software  design  to  the  final  programs  of 
the  system.  This  tracing  capability  is  essential  for  the 
maintainer  of  a  software  system/  who  must  be  able  to  understand 
and  modify  the  system  rapidly  and  correctly. 

Although  it  is  important  to  identify  the  correspondence 
between  the  requirements  which  are  to  be  changed  and  the  code 
which  must  be  changed  as  a  result/  there  are  several  tasks 


l 


which  must  be  performed  by  the  maintenance  personnel  before  the 
modified  software  system  can  be  made  operational  again.  These 


tasks  constitute  our  software  maintenance  methodology  CYAU7B* 
80a*  80e*  82c ] *  and  they  are  shown  in  Figure  2.2.  After 
determining  which  parts  of  the  software  system  must  be  changed 
in  order  to  affect  the  modification  request*  software  changes 
must  actually  be  carried  out*  their  consequences  must  be 
analyzed*  and  the  modified  system  must  be  retested. 

In  the  following  sections  we  will  describe  our  approaches 
to  each  of  these  problems  of  software  maintenance.  In  Section 
3*  we  will  describe  a  software  system  model  which  may  be  used 
to  trace  the  correspondence  between  the  software  requ irements* 
software  designs  and  programs  of  large-scale  software  systems. 
In  Section  4*  we  will  then  summarize  our  approach  for  improving 
the  reliability  with  which  the  program  code  can  be  modified 
using  a  program  slicer  to  assist  in  locating  the  code  to  be 
modified  and  a  structure-or iented  editor  to  make  the 
modifications  free  from  syntax  errors.  In  Section  5*  we  will 
summarize  our  ripple  effect  analysis  technique*  which  is  used 
to  analyze  the  effects  of  the  program  modifications  on  the 
behavior  of  the  program.  This  static  analysis  technique  allows 
potential  logical  and  performance  changes  to  be  identified. 
The  final  phase  of  our  methodology  is  to  retest  the  modified 
system.  In  Section  6*  we  will  summarize  our  module  testing 
technique*  which  reuses  existing  test  cases  whenever  possible 


to  reduce  the  retesting  effort. 
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2.2  Tools  And  Techn  ioues  Kor  A  Sof  tware  lia  i  nten  ance 

Environment 

The  techniques  for  realizing  program  modification 
proposals*  ripple  effect  analysis  and  module  testing  have  been 
demonstrated  by  implemented  programs  running  on  a  DEC 
UAX-ll/780  computer  under  the  UMS  operating  system.  The 
technique  for  defining  and  tracing  the  correspondence  from 
software  requirements*  via  software  design  to  program  code  was 
not  implemented  during  the  time  available. 

Tools  for  software  maintenance  should  be  able  to  share  a 
common  program  representation.  This  integration  of  tools 
provides  maintenance  programmers  with  a  standardized 
environment  for  performing  maintenance  activities.  We  have 
developed  a  formal  program  representation  to  support  the  tools 
described  in  Section  4*  which  permits  an  efficient 
implementation  of  our  tools  for  program  modification.  in 
addition*  we  have  developed  efficient  representations  of 
programs  for  implementing  each  of  the  ripple  effect  analysis 
techniques.  While  this  approach  to  implementing  software  tools 
is  sufficient  to  demonstrate  the  validity  of  individual 
techniques*  software  tools  based  on  these  techniques  will  be  of 
greater  practical  value  if  they  share  a  common  model  of  the 
program.  A  more  flexible  program  model*  such  as  the 


Hierarchical  Graph  model  CYAU80d,  81a,  82b],  provides  the  means 
for  combining  different  software  tools  into  an  integrated 
software  maintenance  environment.  Like  the  model  used  by  our 
syntax  directed  editor,  this  model  is  based  on  the  abstract 
parse  tree  of  programs.  Since  it  also  includes  detailed 
information  about  data  flows  in  the  program,  it  appears  to 
provide  a  suitable  basis  for  integrattng  our  individual 
software  maintenance  techniques  into  a  set  of  practical, 
cooperating  tools. 

The  ripple  effect  analysis  techniques  have  been  developed 
to  perform  exhaustive  analysis  in  the  sense  that  they  are 
capable  of  identifying  all  blocks  of  a  program  which  may  be 
affected  by  a  program  modification  CYAU78,  80a,  80b,  80c] 
CHSIE82].  However,  to  implement  such  a  technique  as  a 
practical  tool  requires  that  we  allow  the  maintenance 
programmer  to  restrict  the  tracing  of  ripple  effeevS  in 
accordance  with  his/her  own  understanding  of  the  software 
system.  Our  implementation  of  the  the  logical  ripple  effect 
analysis  technique  permits  the  programmer  to  interact  with  the 
analysis  program  to  select  certain  procedures  for  analysis  and 
to  remove  others  from  consideration.  Additional  effort  on  the 
interface  to  these  tools  would  be  needed  to  improve  their 
practical  effectiveness. 


Although  our  work  has  been  to  develop  techniques  for 


software 

maintenance*  they  are 

also 

usef u  1 

during  certain 

stages  of 

software  development. 

Our 

appro  ach 

to  realizing 

software 

modification  proposals* 

for 

examp le* 

uses  a  syntax 

d irected 

editor  -  a  tool  which  is 

also 

very  useful  for  the 

initial  writing  and  debugging  of  programs.  Furthermore*  the 
activities  involved  in  debugging  a  program  require  the 
identification  of  two  types  of  code:  the  first  may  c ause 
certain  unintended  effects  (bugs)*  the  second  may  be.  affected 
because  of  changes  made  to  repair  bugs.  However*  the  program 
slicer  (Section  4)  has  been  developed  to  identify  code  of  the 
first  type*  while  ripple  effect  analysis  (Section  5)  is 
intended  to  identify  code  of  the  second  type.  In  practice*  we 
would  expect  these  tools  to  be  used  even  more  effectively  in 
the  development  phase*  since  the  development  programmer  can 
take  advantage  of  his/her  familiarity  with  the  program  under 
deve 1 opment . 

2  *  3  Qu  a  1 i tu  Factors  Affect i no  Software  Maintenance 

One  important  concept  which  runs  throughout  the  entire 
software  maintenance  methodology  is  the  use  of  software 
metrics.  Our  long  term  goal  is  to  develop  a  software  metric 
for  modifiability  -  to  provide  a  quantitative  indicator  of  the 


amount  of  effort  required  to  make  changes  to  particular 


programs  or  modules*  and  we  have  already  developed  some 
measures  of  certain  attributes  of  modifiability*  which  will  be 
described  in  detail  in  Section  7.  The  earliest  measures  which 
we  have  developed  are  those  for  the  logical  stability  of 
programs  and  modules  CYAUS0el.  These  are  based  on  our  ripple 
effect  analysis  technique*  and  have  been  proposed  as  indicators 
of  the  resistance  of  a  program  or  module  to  ripple  effects  as  a 
result  of  changes  made  to  it.  Me  have  also  developed  a  measure 
for  the  logical  stability  of  program  design  C YAUSSc 1  since  we 
recognize  the  value  of  an  early  indication  of  deficiencies  in 
the  quality  of  a  software  system.  However*  a  metric  will  not 
really  be  useful  until  it  has  been  shown  to  correlate  with  the 
phenomenon  which  it  is  supposed  to  measure.  Me  have* 
therefore*  devoted  some  additional  effort  to  the  validation  of 
our  proposed  stability  metrics*  and  the  preliminary  results  of 
our  validation  experiments  will  also  be  presented  in  this 


report . 


These  requests  usually  refer  to  the  interface  which  already 
exists  between  the  software  system  and  its  operating 
environment.  It  is  with  such  change  requests  that  the  process 
of  specifying  software  maintenance  proposals  begins. 

In  order  to  correctly  modify  a  software  system#  it  is 
necessary  to  understand  the  relationship  between  the  change 
requests  and  the  programs  which  make  up  that  system.  Since 
this  requires  a  clear  understanding  of  both  the  behavior  of 
those  programs  and  the  effects  of  the  requested  changes#  a  vast 
amount  of  effort  or  prior  experience  with  the  system  is 
necessary.  In  the  absence  of  such  effort  or  experience#  the 
most  logical  alternative  is  to  record  information  which 
describes  the  relationships  between  the  program  code  and  the 
software  system's  application  area. 

The  programmers'  view  of  the  same  system  is  shown  in 
Figure  3.2.  During  the  maintenance  process#  these  two  views  of 
the  same  software  system  (the  users'  view  and  the  programmers' 
view)  must  be  reconciled  in  such  a  way  that  the  enhancements 
requested  by  the  user  are  implemented.  This  requires  changes 
to  be  made  in  both  the  users'  and  the  programmers'  views  of  the 
system  and  these  changes  must  continue  to  be  compatible  with 


each  other. 


6 


programmers  must  preserve  a  "semantic  equivalence"  between  the 
users'  new  view  and  the  programmers'  new  view. 


Users'  Old 
U  iew 


Semant  i  c 
Equ i va 1 ence 


Modification 

Request 


Users'  New 
Uiew 


Semant  ic 
Equ ivalence 


Programmers  ’ 
U  iew 


Old 


Mod i f ic  at  i on 


Programmers'  New 
Uiew 


Figure  3.3.  Equivalence  preserving  requirements  for  reliable 
software  modification. 

So  far  we  have  only  discussed  the  problem  of  specifying  a 
software  maintenance  proposal  in  a  very  abstract  manner.  Now, 
we  would  like  to  consider  some  of  the  practical  problems, 
especially  those  involved  in  providing  automated  assistance  for 
the  maintenance  personnel  who  must  make  the  "semantic 
equivalence  preserving"  modification. 

The  first  problem  is  to  describe  software  systems  using 
formal  notations  or  formal  descriptions.  Since  we  cannot 
expect  any  automated  assistance  in  dealing  with  informal 
notations  or  descriptions,  we  must  ensure  that  all  notations 


used  to  describe  the  software  system  have  been  formalized  as 
much  as  possible.  In  dealing  with  the  programmers'  view  of  the 
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system*  me  are  on  fairly  solid  ground  with  respect  to  formal 

I  notation*  since  all  programming  languages  must  at  least  have  a 

i 

well-defined  syntax  -  to  allow  automatic  elimination  of  some 
programs  which  are  clearly  incorrect.  In  addition*  all 
programming  languages  must  have  a  semantic  definition  so  that 

! 

the  programmer  can  predict  the  behavior  of  the  code  being 
written.  However*  these  semantic  definitions  are  frequently 
informal*  are  often  subject  to  implementation  constraints  and 

I 

1  occasionally  permit  several  interpretations.  When  dealing  with 

the  users'  view  of  a  software  system*  we  cannot  expect  that  a 
very  formal  notation  is  being  used.  The  best  we  can  hope  for 

'  is  that  parts  of  the  system  have  been  defined  in  a  notation 

such  as  RSL  CALF077II,  SADT  CR0SS771  or  SA  CDEMA78J*  which  have 
varying  degrees  of  formalization.  If  we  do  not  have  such  a 

i 

description  of  the  system*  one  must  be  developed;  otherwise* 
we  will  be  unable  to  have  any  precise  idea  of  what  a  change 
request  entails  until  we  have  found  the  relevant  program  code 
which  must  be  changed.  One  major  problem  with  this  approach  is 
the  likely  existence  of  (though  perhaps  minor)  discrepancies 
between  the  users'  actual  concept  of  the  system  operation  and 
the  programmers'  description  of  that  concept.  However*  given 
formal  descriptions  of  these  two  views  of  the  software  system* 
we  can  proceed  to  study  the  effects  on  the  one  of  changes  made 
to  the  other.  In  order  to  deal  with  these  issues*  we  will 


develop  formal  models  of  the  different  views  of  the  software 


Page  16 


system  and  proceed  by  working  with  these  models. 

3.1  A  Model  Of  Software  Sustems  For  Software  Maintenance 

The  most  important  questions  to  answer  when  we  decide  to 
model  the  processes  and  products  of  software  maintenance  are 
what  to  model  and  how  to  model  it.  Me  now  describe  how  we  have 
approached  these  problems*  and  explain  the  reasons  for  our 
choices.  Me  will  then  present  the  details  of  our  modelling 
approach . 

3.1.1  Background 

The  major  activity  of  the  software  maintenance  process  is 
to  make  changes  to  existing  documents  which  describe  a  software 
system.  These  changes  may  be  trivial  or  substantial*  optional 
or  essential.  They  may  be  carried  out  by  a  single  person  or  by 
several  independent  groups  of  people.  Since  these  documents 
are  interdependent  (for  example*  the  design  document  is  derived 
from  the  requirements  document)*  we  must  also  be  able  to  model 
the  process  of  changing  a  document  in  response  to  changes  in 
another  document.  It  is  frequently  necessary  to  retain  several 
versions  of  each  document*  and  therefore  we  must  also  control 
modifications  so  that  they  are  made  to  the  correct  version  and 
in  the  correct  sequence.  Thus*  we  have  identified  the 


following  three  major  activities  for  which  our  model  is  needed: 
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1)  Modifications  to  a  single  software  document/  by  either  (a) 
a  single  programmer  or  (b)  several  independent  programming 
groups . 

2)  Replacement  of  portions  of  a  software  document  in  response 
to  changes  made  to  its  source  document. 

3)  Control  of  different  versions  of  individual  documents  and 
their  interdependencies. 

In  practice#  the  software  documents  which  must  be  modified 
may  exist  in  either  a  textual  or  graphical  form.  However/  in 
both  cases  there  is  a  substantial  amount  of  context  sensitive 
information  present  in  the  document.  Due  to  the  limitations  of 
the  descriptive  power  of  strings  (and  even  trees)  when 
modelling  context  sensitive  information/  some  other  approach  is 
required.  Therefore/  we  have  chosen  to  use  graphs/  with  their 
greater  descriptive  power/  to  directly  show  context  sensitive 
properties . 

Having  adopted  the  graph  as  the  basic  representation  for 
software  documents/  we  must  express  changes  to  these  documents 
as  graph  modifications.  Graph  modifications  are  commonly 
described  by  means  of  graph  rewriting  systems.  Using  existing 
methods  for  studying  graph  rewriting/  it  is  possible  to  control 
concurrent  access  to  a  software  document/  since  tests  have  been 


developed  to  check  if  two  separate  modifications  to  a  single 
graph  are  sequential  independent  (may  be  executed  in  either 
sequence)  or  parallel  independent  (may  be  executed 
concurrent  1  y  ) .  These  checks  are  necessary  when  several  groups 
work  together  to  modify  a  large  software  system.  When  the 
modifications  are  interdependent,  these  tests  may  be  used  to 
identify  the  interface  (or  interaction)  region  of  the  two 
modifications  on  the  graph. 

3.1.2  Graph  Rewr i t i no  Systems 

Graph  rewriting  systems  have  become  a  topic  for  research 
in  recent  years  CCLAU793,  primarily  as  a  result  of  the  great 
significance  which  graphs  and  graph  theoretic  concepts  have 
assumed  in  computer  science  and  engineering.  Since  we  wish  to 
model  the  processes  of  software  maintenance*  and  to  do  so  in  a 
very  abstract  manner,  it  is  natural  to  examine  the  use  of  such 
an  abstract  tool,  particularly  in  view  of  the  preponderance  of 
graph  representations  for  software  requirements  and  design. 

3. 1.2.1  Graph  Rewr i t i no 


To  rewrite  a 

gr  aph 

me  ans 

that 

we  will 

apply  a  set 

of 

rewr it  ing 

ru  les 

to  the 

graph. 

one 

by  one,  in 

some  sequence. 

to 

construct 

another 

gr  aph 

A 

rewriting  rule 

corresponds 

so 

closely  to  a  production  rule  of  a  grammar  for  a  language  that 


graph  rewriting  systems  are  also  known  as  "graph  grammars". 

Each  graph  rewriting  rule  has  a  left-hand  side  and  a 
right-hand  side*  each  of  which  is  a  graph.  To  apply  a 
rewriting  rule  with  the  left-hand  side  L  and  the  right-hand 
side  R  to  a  graph  G,  it  is  first  necessary  to  locate  an 
instance  of  the  graph  L  as  a  subgraph  of  G.  If  no  such 
instance  exists*  then  the  rewriting  rule  cannot  be  applied.  If 
such  an  instance  does  exist*  then  it  should  be  (conceptually) 
deleted  from  G*  giving  rise  to  the  graph  G  -  L*  and  then  the 
graph  R  should  be  (conceptually)  added  to  G  in  its  place* 
giving  rise  to  a  new  graph  H  =  (G  -  L)  +  R. 

The  most  difficult  part  of  the  entire  process  is  to  embed 
the  right-hand  graph  R  into  G  in  place  of  L.  When  strings  are 
being  rewritten,  this  embedding  of  the  right-hand  side  is  made 
obvious  by  the  implicit  left  to  right  ordering  of  the 
characters  in  the  string.  This  is  illustrated  in  Figure  3.4. 

In  Figure  3.5  we  show  the  difficulty  involved  in  embedding 
a  graph  within  a  graph.  The  rewriting  rule  shown  there 
requires  that  we  replace  the  node  labelled  "a"  by  a  subgraph 
consisting  of  three  nodes  (labelled  "b".  "c"  and  "d")  and  two 
arcs  (from  "b"  to  "c"  and  from  "c"  to  "d").  Clearly*  the 
rewritten  graph  must  contain  five  nodes*  labelled  "b".  “c"* 
"d".  "e"  and  "f".  In  addition,  it  must  contain  arcs  from  "b" 


Ru  1  e 


a  z >  bed  (Replace  "a"  by  "bed") 

Add  1 i c  at i ons 

If  the  original  string  is  "baabe”  then  the  following  applications 
of  the  rule  may  be  made. 

1)  baabe  =>  baabe  - >  bbedabe  =>  bbedabe 

2)  bbedabe  =>  bbedabe  =  >  bbcdbcdbe  =  >  bbcdbcdbe 

Figure  3.4.  An  example  for  rewriting  a  string. 

to  "c"  and  from  "c"  to  "d".  However,  what  should  be  done  with 
the  arcs  in  the  original  graph  from  "a"  to  "e"  and  from  "a"  to 
"f"?  That  is.  how  should  the  new  subgraph  be  embedded  into  the 
original  graph?  The  approach  which  we  have  adopted  is  to 
assign  integer  labels  to  certain  nodes  or  arcs  in  the  rewriting 
rule,  with  the  constraint  that  any  integer  which  appears  on  the 
left-hand  side  of  the  rule  must  also  appear  on  the  right-hand 
side.  The  interpretation  of  this  assignment  of  labels  is  that 
when  a  rule  is  applied,  any  node  labelled  ”i"  on  the  left-hand 
side  is  considered  to  be  replaced  by  the  node  labelled  "i"  on 
the  right-hand  side  so  that  any  arcs  incident  to  (or  from)  that 
node  in  the  original  graph  should  be  incident  to  (or  from)  the 
replacement  for  that  node  in  the  graph  on  the  right-hand  side. 

In  Figure  3.5.  there  is  only  one  node  to  be  replaced  (labelled 
"a")  and  its  replacement  node  is  the  node  labelled  "c"  (as 
shown  by  the  integer  label  "1").  Thus,  the  rewritten  graph  is 
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the  one  shown  at  the  bottom  of  that  figure. 
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Figure  3.5.  An  example  for  rewriting  a  graph. 


3 . 1 . 2 . 2  Definition  Of  A  Label  led  Gr  aph 

To  formalize  the  graph  representations  for  software 
systems*  uie  define  a  labelled  graph  as  follows:  A  labelled 
gr  aph  is  an  8-tuple* 

G  =  (N*  A*  LN,  LA*  sN.  tN,  nL*  aL ) . 
where  N  is  o  set  of  nodes* 

A  is  a  set  of  arcs* 

LN  is  a  set  of  node  labels* 

LA  is  a  set  of  arc  labels* 

sN,  tN  :  A  ->  N  are  functions  which  map  each  arc  to 
its  source  and  target  nodes  (respectively)* 
nL  :  N  - >  LN  is  a  function  which  maps  each  node  to 
its  label* 

aL  :  A  ->  LA  is  a  function  which  maps  each  arc  to 
its  label. 

3. 1 . 3  The  Intraphase  Node  1 

The  software  documents  used  to  describe  each  phase  of  the 
software  development  process  will  each  be  modelled  by  a  set  of 
interconnected  components  of  the  software  system.  We  represent 
each  component  by  its  control  flow*  data  flow  and  data 
structures*  and  also  by  its  relationships  to  other  components. 
Its  interface  with  other  components  is  stated  in  terms  of 
objects  required  from  other  components  and  objects  provided  for 


other  components.  A  software  system  is  simply  a  collection  of 
such  components^  uiith  a  distinguished  initial  (or  master) 
component . 

3. 1.3.1  Software  Components 

A  software  component  is  an  executable  object  which 

contains  several  subcomponents.  These  subcomponents  are: 

a  control  flow  structure  (in  a  form  to  be  described)' 

a  set  of  data  structure  graphs  (of  a  similar  form). 

a  set  of  data  flow  triples'  whose  executable  objects  are 
"leaves"  of  the  control  flow  graph  and  whose  (input  and 
output)  data  objects  are  data  structure  graphs*  and 

a  set  of  distinct  object  names*  each  of  which  refers  to  a 
data  structure  graph  or  module  which  defines  the  structure 
of  that  object. 

3. 1.3. 1.1  Control  Flow 

The  fallowing  notation  will  be  used  to  describe  the 
control  flow  of  a  software  system  component.  Since  it 
emphasizes  only  the  relative  ordering  of  activities*  this 
notation  is  independent  of  the  particular  notation  being  used 
to  describe  the  component.  We  have  confirmed  that  it  can  be 
used  to  describe  most  of  the  control  flow  properties  of  a 
requirements  definition  in  RSL  or  a  program  in  PASCAL.  This 
notation  uses  the  formalism  of  a  labelled  graph*  using  nodes  to 


represent  •'activities"  and  arcs  to  represent  relationships 
between  these. 

Let  us  noui  introduce  our  notation.  First  of  all  we 
specify  the  basic  notation  completely*  and  then  describe  the 
remainder  of  the  notation  informally. 

A  basic*  structured*  sequential  control  flow  description 
is  a  labelled  graph  with 

LN  =  {TASK*  LOOP,  AND*  0R>  U  2+  U  <e>* 

LA  =  2+  U  <e> , 

where  2+  denotes  the  positive  integers  Cl*  2*  3*  ...>  and  e 

denotes  the  empty  string. 

The  use  of  these  symbols  is  now  explained  informally:  The 
graph  is  a  rooted  tree  structure*  directed  downwards.  Nodes 
labelled  by  LOOP*  AND  or  OR  are  referred  to  as  structured 
nodes.  E-labelled  nodes  are  referred  to  as  primitive  nodes. 
Nodes  and  arcs  labelled  by  e  are  referred  to  as  e-1 abe lied. 
Nodes  and  arcs  labelled  by  a  positive  integer  are  called 
2- 1 abe lied.  Structured  nodes  are  always  nonterminal  nodes  in 
the  tree.  Primitive  nodes  are  always  terminal  nodes  (leaves) 
of  the  tree.  All  leaves  of  the  tree  are  e-labelled.  All  nodes 
and  arcs  of  the  tree  are  labelled  by  a  label  from  LN  or  LA. 

1)  Form:  The  software  component  has  a  single*  distinguished 
node,  labelled  TASK.  This  node  is  the  root  of  the  tree. 
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Interpretat  ton :  The  subtree  of  which  this  node  is  the  root 
represents  a  separately  defined,  executable  software  component. 

2)  Form :  A  LOOP  node  always  has  a  single  child. 

Interpretat  ion :  The  activity  represented  by  the  subtree  rooted 
at  the  child  of  the  LOOP  node  is  to  be  executed  a  number  of 
times,  ranging  from  zero  to  a  finite  number  to  be  decided 
within  the  LOOP  node  in  an  (as  yet)  unspecified  manner. 

3)  Form:  An  AND  or  OR  node  always  has  a  single  child.  which 
must  be  an  Z-labelled  node. 

Interpretat ion :  An  AND  node  indicates  that  the  children  of  its 
Z-labelled  child  must  a  1 1  be  executed.  while  an  OR  node 
indicates  that  one  child  of  its  Z-labelled  child  must  be 
executed . 

4)  Form:  Any  Z-labelled  node,  with  label  n.  must  also  have 
outdegreee  n.  and  its  parent  in  the  tree  must  be  labelled  by 
either  AND  or  OR.  The  arcs  of  which  this  node  is  the  source 
must  be  labelled  by  the  positive  integers  Cl.  2.  3.  ....  n>. 
Interpretat ion :  The  value  of  the  arc  label  indicates  the  order 
in  which  the  activity  should  be  executed.  The  activity 
labelled  i  should  be  executed  before  the  activity  labelled  i+l. 

In  summary.  AND  represents  the  execution  of  a  sequence  of 
(n)  activities.  OR  indicates  the  selection  of  1  (of  n) 
activities,  and  LOOP  represents  the  repeated  execution  of  an 


act ivity . 
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3. 1.3. 1.1.1  Conditional  Expressions 

Our  current  approach  to  the  expressions  which  control 
selections  and  iterations  is  to  restrict  them  to  be  of  one  of 
two  types:  they  may  have  the  form  of  either  a  r anqe  of  values 
or  a  cond  i  t  i  on  (or  boolean  expression).  Figure  3.6  shows  an 
example  of  an  RSL  statement  and  its  graph  representation. 


RSL  statement 

IF  FOUND  =  TRUE 
ALPHA:  A1 
OTHERWISE 
ALPHA:  A2 

END 

Contro  1  f low  reoresentat ion 


Figure  3.6.  An  example  of  the  control  flow  representation. 
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Figure  3.8.  Abbreviated  representation  of  the  example  shoum 
in  Figure  3.6. 

3. 1 . 3. 1 . 1 . 3  Extens ions  To  Other  Control  Structures 

Extensions  to  this  basic  form  of  control  flow  description 
have  been  defined  to  describe  (unstructured)  jumps  and 
concurrency  or  nondeterminism.  Jumps  are  included  as  directed 
arcs  between  two  nodes*  the  arc  being  specially  labelled  to 
distinguish  it  from  the  arcs  representing  structured  control 
flow.  Concurrency  or  nondeterminism  are  included  by  permitting 
Z-labelled  nodes  to  be  the  source  of  e-labelled  arcs.  This 
removes  the  ordering  concept  described  in  Form  4)  discussed 
before,  and  so  permits  nondetermin ism.  A  further  extension  has 
been  defined  to  support  inclusion  of  separately  defined 
software  components  within  another  component.  This  represents 
both  the  SUBNET  concept  of  RSL  and  the  procedure  concept  of 
programming  languages,  such  as  PASCAL.  With  these  extensions 
the  graph  is  no  longer  a  tree  structure,  but  the  non-tree  arcs 
are  distinctively  labelled. 
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Using  such  an  abstract  view  of  control  flow,  it  is 

possible  to  construct/  for  example/  the  skeleton  of  a  PASCAL 
program  from  the  control  floui  requirements  of  an  RSL 
specification.  In  addition/  the  theory  of  graph  modification 
[CLAU793  provides  us  uiith  a  foundation  for  defining 
modifications  formally/  and  for  relating  this  formal  definition 
to  modifications  which  are  to  be  made  to  software  systems' 

descriptions  in  notations  which  are  currently  in  use. 

3. 1.3. 1.2  Data  Flow 

Data  flow  information  has  also  been  added  to  our  model. 
This  information  may  be  viewed  as  a  set  of  triples  of  the  form: 

<  EO/  10,  00> 

where  EO  is  an  executable  object  (such  as  a  statement  or 
procedure),  and  10  (the  input  object)  and  00  (the  output 
object)  are  data  objects  (such  as  program  variables).  Such  a 
triple  has  the  interpretation  that  EO  may  use  the  value  of  10 
to  alter  the  value  of  00.  In  its  graphical  form,  each  such 
triple  denotes  the  existence  of  an  arc  from  activity  EO, 

labelled  00,  to  some  other  activity,  and  from  some  other 

activity  to  activity  EO,  labelled  10. 
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For  example*  the  activity  Al*  written  as  an  ALPHA  in  RSL* 
appears  as: 
alpha:  ai. 

INPUTS:  DATA:  D1 . 

OUTPUTS:  DATA:  D2 
DATA:  D3. 

and  would  be  represented  as: 

<  Al *  Dl*  D2  > 

<A1,  Dl,  D3> 

unless  further  information  is  available.  However*  if  we  have 
information  that  D3  is  being  assigned  a  value  in  Al  which  is 
independent  of  Dl*  then  we  would  represent  Al  as: 

<  Al *  Dl*  D2> 

<  Al *  K*  D3> 

where  K  is  some  relevant  constant  or  other  independently 
defined  data  object.  Figure  3.S  shows  the  graphical 
representation  of  this  latter  case. 


Figure  3.9.  An  example  of  the  data  flow  represent  at  ion . 
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3. 1.3. 1.3  Data  Structures 

In  the  previous  section  uie  described  a  graph 
representation  for  control  flou»>  and  indicated  that  we  have 
also  developed  a  very  similar  representation  for  data 
structures.  Me  uiill  use  our  data  flow  information  to  connect 
representat ions  of  data  structures  (which  we  call  "input  data 
objects"  or  "output  data  objects")  to  representations  of 
control  flow  structures  (which  we  call  "executable  objects"). 
The  graph  representat  ion  of  data  structures  resembles  that  used 
to  describe  executable  activities/  in  that  sequences  of 
heterogeneous  data  objects  are  represented  by  trees  rooted  with 
an  AND  node/  selections  of  one  of  several  data  objects  are 
represented  by  trees  rooted  with  an  OR  node/  while  collections 
of  several  homogeneous  objects  are  represented  by  trees  rooted 
with  a  LOOP  node.  For  example/  the  data  item  Dl/  written  in 
RSL  as : 

DATA:  Dl. 

includes:  data:  Dl-Pl 

DATA:  D1-P2 
DATA:  D1-P3. 

would  be  represented  as  shown  in  Figure  3.10.  In  the  event 
that  the  subcomponents  of  Dl  are  also  structures/  then  their 
structure  will  also  become  a  substructure  of  Dl. 


■  x  ^  «_ 


Figure  3.10.  An  example  of  data  structure  representat ion . 


3. 1.3. 1.4  Data  Dictionary 

Within  the  description  of  each  component  is  a  data 
dictionary.  As  is  customary*  this  dictionary  contains  a 
definition  of  each  element  of  this  component*  excluding  those 
which  belong  to  other  components*  but  are  used  uiithin  this 
component.  There  are  three  types  of  elements  which  exist  in 
any  component  -  activities*  data  and  structures. 

Activities  are  defined  by  the  data  items  which  enter  them 
or  leave  them.  They  also  describe  the  operations  which  are 
performed  on  that  data.  These  operations  include  operations 
defined  by  the  language*  notation  or  operating  environment,  and 
those  carried  out  by  other  components  of  the  system.  Examples 
are  the  "+"  operation  of  a  PASCAL  program*  which  probably 
refers  to  a  hardware  dependent  addition  instruction*  and  the 
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PASCAL  "sin"  (sine  of  an  angle)  function#  which  probably  refers 
to  a  function  in  the  system's  standard  library  of  functions. 

Data  are  defined  only  by  their  structure  and  external 
name.  The  structure  of  a  data  item  may  be  defined  by  the 
language#  notation  or  (less  often#  but  hardware  dependent) 
operating  environment#  or  by  other  components  of  the  system. 
Examples  are  the  standard  file  "input"  of  a  PASCAL  program# 
which  refers  to  a  standard  system  input  file  (usually  the 
terminal  keyboard)#  and  the  PASCAL  constant  "maxint"#  which  has 
the  value  of  the  largest  integer  available  to  PASCAL  programs 
in  a  particular  computer  system. 

Structures  are  defined  in  terms  of  connected 
sub-components#  which  are  themselves  either  data  items  or  other 
structures.  Subcomponents  may  refer  to  structures  defined  by 
the  language#  notation  or  (less  often#  but  hardware  dependent) 
operating  environment,  or  by  other  components  of  the  system. 
Examples  are  the  standard  types  "text"  and  "integer"  of  PASCAL 
programs.  "Text"  refers  to  the  system's  implementation  of 
sequential  files  of  characters#  while  "integer"  is  affected  by 
the  available  word  length  and  precision  of  the  computer  system. 

The  data  dictionary  is  a  sub-structure  indexed  by  an 
internal  object  name#  denoting  a  particular  activity#  structure 
or  piece  of  information.  When  the  object  is  a  named  activity# 


Page  36 


the  index  leads  to  another  software  component.  When  the  object 
is  an  unnamed  activity*  the  index  leads  to  a  description  of  the 
activity  (which  may  be  in  a  formal  or  informal  notation).  When 
the  abject  is  a  structure*  the  index  leads  to  a  description  of 
the  form  of  that  structure.  The  lowest  level  structures  are 
those  defined  by  the  computer  instal 1  at  ion .  When  the  object  is 
a  piece  of  information*  the  index  leads  to  the  definition  of 
the  structure  which  is  contained  in  the  piece  of  information. 

For  instance*  the  previous  examples  would  give  rise  to  the 
following  data  dictionary  entries: 

< Dl *  DS1 >  where  DS1  is  the  structure  shown  in  Figure  3.10* 

< D1 -Pi *  DS2> 

<  Dl— P2*  DS2  > 

<  DS-P3*  DS2  > 

<81*  CS1>  where  CS1  is  the  control  flow  structure  shown  in 
Figure  3.0. 

Furthermore*  DSl  and  DS2  are  the  names  of  data  structures  to  be 
found  within  other  components*  and  D2*  D3  and  K  are  also 
assumed  to  be  defined  within  other  components. 

3. 1.3.2  Component  Interfaces 

Any  software  component  is  a  separately  defined  activity  in 
an  overall  software  system.  In  order  to  act  in  a  coordinated 
manner*  the  components  must  share  information  with  each  other 
and  provide  services  for  each  other.  We  would  like  to 
represent  the  interdependencies  between  components  in  a 


disciplined  fashion*  in  a  u>ay  that  permits  modifications  to  be 
analyzed  and  to  match  the  representation  of  software 
components.  Our  approach  to  this  problem  is  to  associate  an 
interface  subcomponent  with  each  software  component.  Mithin 
this  interface  are  defined  all  of  the  objects  which  appeared 
outside  the  current  component*  and  also  all  of  the  objects 
which  appeared  inside  the  current  component*  but  may  be  used  by 
other  components.  These  objects  are  further  distinguished 
between  those  which  are  directly  linked  to  external  components 
and  those  which  are  indirectly  linked  (as  parameters). 

The  interface  graph  can  be  formally  defined  as  follows: 
An  interface  or aph  is  a  labelled  graph  : 

G  =  (N,  A,  LN*  LA*  sN*  tN*  nL*  aL) 

where 

LN  =  C  INTERFACE,  GLOBALS*  PARAMETERS,  IMPORTS*  EXPORTS!) 

U  Z+  U  Ce> 
and  LA  =  Ce> 

The  graph  is  a  rooted*  acyclic*  directed  graph  (acyclic 
"digraph").  Nodes  labelled  by  GLOBALS*  PARAMETERS*  IMPORTS  and 
EXPORTS  are  referred  to  as  structured  nodes.  Nodes  and  arcs 
labelled  by  e  are  referred  to  as  e-labelled.  E-labelled  nodes 
are  referred  to  as  or i m i t i ve  nodes.  Nodes  and  arcs  labelled  by 
a  positive  integer  are  called  Z-labelled.  Structured  nodes  are 
always  nonterminal  nodes  in  the  tree.  Primitive  nodes  are 
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always  terminal  nodes  (leaves)  of  the  tree.  All  leaves  of  the 
tree  are  e-labelled.  All  nodes  and  arcs  of  the  tree  are 
labelled  by  a  label  from  LN  or  LA. 

1)  Form:  The  component  interface  has  five  distinguished  nodes/ 
labelled  INTERFACE/  GLOBALS,  PARAMETERS/  IMPORTS  and  EXPORTS. 
The  INTERFACE  labelled  node  is  the  root  of  the  digraph.  The 
other  four  distinguished  nodes  are  immediate  descendants  of  the 
root  node.  No  other  node  is  directly  connected  to  the  root 
node 

Interpretation :  The  subgraph  of  the  root  node  represents  the 
interface  between  this  component  and  other  components.  All 
references  to  or  from  other  components  are  forced  to  pass 
through  this  subgraph.  This  subgraph  includes  all  names  and 
structural  information  which  is  required  to  complete  the 
interface  with  any  external  component. 

2 )  Form:  The  primitive  nodes  of  the  interface  digraph  are 
directly  connected  to  exactly  one  of  the  following  nodes 
CGLOBALS,  PARAMETERS>. 

Interoretat ion :  The  primitive  nodes  represent  the  interface 
objects.  If  a  node  is  connected  to  GLOBALS/  then  the  object 
represented  by  that  node  may  be  accessed  directly.  If  a  node 
is  (instead)  connected  to  PARAMETERS/  then  the  local  object 
represented  by  that  node  provides  indirect  access  to  another 
object/  and  this  relation  between  the  local  object  and  the 


other  object  may  be  altered  by  execution  of  the  system  of 
components . 

3)  form:  The  primitive  nodes  of  the  interface  digraph  are 
directly  connected  to  at  least  one  of  the  following  nodes 
{IMPORTS/  EXPORTS>  (connection  to  both  of  these  is  possible). 
Inter pret  at  i  on  :  The  primitive  nodes  represent  the  interface 
objects.  If  a  node  is  connected  to  IMPORTS/  then  the  object 
represented  by  that  node  may  be  examined  and  used/  but  may  not 
be  altered.  If  a  node  is  connected  to  EXPORTS/  then  the  object 
represented  by  that  node  may  be  altered. 

For  the  previous  example/  the  interface  must  include  all 
of  those  objects  which  did  not  appear  in  the  local  data 
dictionary/  i.e.  DS1/  DS2>  D2/  D3  and  K.  Since  RSL  will  not 
permit  data  structures  to  be  altered  within  the  system.  DSl  and 
DS2  must  be  connected  to  the  IMPORTS  node  alone.  In  addition/ 
since  RSL  has  no  facility  for  parameter  passing/  all  five 
objects  must  be  connected  to  the  GLOBALS  node  alone.  The  data 
objects/  D2  and  D3.  are  both  altered  within  the  component,  and 
hence  they  should  be  connected  to  the  EXPORTS  node  alone.  The 
constant  K  is  defined  externally,  and  cannot  be  altered  in  this 
component,  and  therefore  it  should  be  connected  to  the  IMPORTS 
node  alone.  The  interface  graph  is  shown  in  Figure  3.11. 
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GLOBALS 


IMPORTS 


K 1 


DS1 


DS2 


Figure  3.11  An  example  of  a  component  interface  graph  with 
no  PARAMETERS  subcomponent. 


3. 1 . 4  The  Interphase  Mode l 

Although  our  comparison  between  the  features  of  different 
languages  (for  writing  programs*  designs  and  specifications) 
and  the  abilities  of  our  model  to  represent  such  features  has 
shown  that  we  still  have  many  limitations  (e.g.  no  scope 
rules*  no  parameter  passing  rules)*  we  think  that  our  current 
model  is  suitably  complete  for  us  to  study  properties  of 
interest  when  software  modifications  are  proposed. 
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We  noui  develop  a  method  for  expressing  "equivalence" 
relationships  between  objects  in  two  different  evels  of 
description.  We  have  already  indicated  that  these 
relationships  should  be  expressed  as  relationships  between 
graph  structures-  and  have  made  a  preliminary  study  of  methods 


to  achieve 

this. 

It 

is  these 

equ i valence 

relationships  that 

provide  us 

with 

the 

ability 

to  analyze 

the  effects 

of 

mod  if icat ions 

to 

one 

level  of 

descr  ipt  ion 

on  the  behavior 

of 

the  other. 

The  interphase  model  is  composed  of  a  set  of  graph 
rewriting  rules  described  before.  On  the  left-hand  side  of 
each  rule  is  a  subgraph  of  the  graph  model  of  the  software 
system  at  the  end  of  a  particular  phase.  On  the  right-hand 
side  of  each  rule  is  a  subgraph  of  the  graph  model  of  the 
system  at  the  end  of  the  next  phase.  The  rules  record  the  fact 
that  the  subgraph  on  the  left-hand  side  is  to  be  replaced  by 
the  subgraph  on  the  right-hand  side.  Thus-  if  a  change  is  made 
to  a  particular  part  of  a  software  system-  we  can  identify  its 
potential  impact  on  the  next  phase  by  locating  all  rules  with 
that  part  of  the  system  in  its  left-hand  side,  and  identifying 
all  of  the  subgraphs  in  the  corresponding  right-hand  sides. 
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In  order  to  constrain  possible  ripple  effects,  an 
effective  restriction  on  the  software  development  process  is  to 
require  that  each  node  of  a  graph  model  of  a  software  system 
should  appear  on  the  left-hand  side  of  exactly  one  rule.  The 
effect  of  adopt  ng  this  rule  is  to  create  a  process  closely 
resembling  th£  "stepwise  refinement"  process  advocated  by  many 
authors  (e.g.  CWIRT711),  but  applied  to  the  refinement  also  of 
data  flows  and  data  structures,  whereas  stepwise  refinement  is 
involved  primarily  with  control  flow  and  executable  activities. 
For  example,  given  a  process  structure,  using  the  notation  of 
Jackson's  design  methodology  CJACK753  shown  in  Figure  3.12,  we 
have  the  model  represent  at i on  shown  in  Figure  3.13. 


Figure  3.12  A  process  structure  using  Jackson's  design 
methodo  logy 

In  this  case,  the  interphase  model  would  contain  the  rules 
shown  in  Figure  3.14.  These  rules  show  how  the  design 


structure  may  be  derived  from  the  requirements  structure. 


Of 
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3 . 2  Construct  ion  Of  The  Sof  tuiare  Mode  1 

In  this  section  uie  mill  present  a  general  technique  to 
enable  the  software  model  in  the  format  described  above  to  be 
constructed  for  any  software  system.  We  mill  first  describe 
the  approach  for  constructing  the  model  for  a  particular  phase# 
and  then  the  approach  for  constructing  the  interphase  model. 

3.2.1  Construct  i  on  Of  The  Intr  aoh  ase  Model 

The  intraphase  model  describes  control  flom,  data  flom  and 
data  structure.  Not  every  method  of  system  documentation 
possesses  all  of  these  attributes.  Nonetheless,  these 
attributes,  mhich  refer  to  the  sequence  of  activities,  the  flom 
of  information  and  the  form  of  information  are  present  in  all 
systems  and  relevant  to  all  system  descriptions. 

Since  me  mill  use  our  model  for  many  different  notations 
to  cover  several  phases,  it  is  necessary  to  base  the 
construction  of  our  model  on  some  properties  mhich  are 
independent  of  these  individual  notations.  The  properties 
chosen  are  based  on  semantic  rather  than  syntactic  properties, 
on  the  assumption  that  this  basis  mill  be  sufficiently  broad  to 
support  the  general  aims  of  the  model.  For  this  reason, 
construction  of  the  intraphase  model  for  a  particular  notation 
must  be  preceded  by  preparation  of  a  semantic  definition  of 
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that  notation. 

Regarding  the  control  flow.  the  execution  sequence  is 

emphasized.  For  example.  let  us  consider  the  BNF  production 

rule  for  the  PASCAL  syntax  construct  <compound  statements  The 

definition  is: 

(compound  statement) 

begin  (statement)  <  ;  (statement)  >  end 

For  the  purposes  of  having  a  semantic  model  however.  we  are 

more  interested  in  the  fact  that  the  list  of  statements  is  to 

be  executed  in  their  order  of  appearance  than  in  the  use  of 

‘‘begin".  "end"  and  ";"  as  delimiting  tokens  of  this  construct. 

Therefore,  uie  would  "abstract  away"  from  such  a  construct  to 

give: 

(compound  statement)  ::= 

(statement )  C  — >  (statement)  >. 

where  Si  — >  S2  denotes  that  Si  should  be  executed  before  S2. 

Data  flow  properties  are  mainly  associated  with  the 
assignment  statements.  For  example.  a  PASCAL  assignment 
statement  and  its  data  flow  properties  can  be  expressed  in  the 
following  manner : 

(assignment  statement)  ::  =  (variable)  :=  (expression) 
(variable)  ::=  (identifier)  :  (identifier)  C  (index  list)  D 
(index  list)  :  :  =  (index)  -C  .  (index)  > 

(index)  :  :  =  (expression) 

AS  :::  I  -->  0  <  I  =  U.I  U  E.l;  0  =  U.O  U  E.O) 

U.O  ::=  ( i d>  U  1L.O,  U.I  :  :  r  IL.I 

IL.I  :  :  =  md.I  C  U  ind.I  >.  IL.O  ::=  ind.O  <  U  ind.O  > 
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ind.l  ::=  E.I*  ind.O  ::=  E.O 

To  display  data  structure  properties*  uie  note  that  they 

come  from  the  structure  of  declared  objects.  In  PASCAL*  for 

example*  an  array  structure  has  the  following  representation: 

<array  declaration>  ::= 

arr au  C  <index  range  list>  3  of.  <type> 

<index  range  list>  :  :  = 

<index  range>  <  *  <index  range>  > 


with  the  interpretat ion : 


<  type > 

<  type  > 

•  •  ■ 

< type > 

/s 

• 

i 

< i ndex  > 

A 

• 

— ><succ)->  <index>  — > 

•  •  • 

A 

• 

i 

->  <index> 

Therefore*  definition  of  the  intraphase  model  for  a 
particular  notation  is  largely  a  manual  procedure.  The 
natation  must  be  analyzed  to  identify  the  features  determining 
the  order  of  events*  the  flow  of  information  and  the  form  of 
information.  These  features  of  the  notation  must  be 
characterized  in  terms  of  the  basic  elements  of  the  model. 

3. 2. 1 . 1  Definition  Procedure 


l)  For  each  construct  in  the  language  definition  which 
corresponds  to  a  distinct  activity*  define  an  entity. 
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2)  For  each  construct  in  the  language  definition  which  defines 
a  sequencing  relationship  between  other  constructs/  define 
a  relational  entity. 

3)  For  each  construct  in  the  language  definition  which  defines 
information  flow  into  or  out  of  an  activity/  define  a 
triple  (activity/  I-set/  0-set>. 

4)  For  each  construct  in  the  language  definition  which  defines 
the  form  of  a  piece  of  information/  define  a  (name/ 
structure>  pair/  where  <structure>  is  derived  from  the  form 
of  the  information. 


3.2. 1 .2  An.  Ex  amp  le  For  Construct  i  no  An  Intraphase  Mode  1  ( An 
RSL  Subset ) 

Here  we  describe  the  construction  of  a  model  for  a  subset 
of  the  Requirements  Statement  Languages  (RSL)  CALF077I.  In  the 
following  definition/  the  nonterminal  symbols  are  delimited  by 
"<"  and  "  ,  optional  symbols  are  delimited  by  "C"  and  ”]"/ 
choices  to  be  made  between  symbols  are  delimited  by  and 
symbols  to  be  repeated  are  delimited  by  and  ">'•/  with 
preceding  and  succeeding  integers  to  denote  the  lower  and  upper 
bounds  respectively  on  the  number  of  iterations.  The 
definition  of  the  subset  of  the  language  now  follows: 
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<neui  element  definition)  ::  = 

CDEFINE3  element-type-name  element-name  [comment]. 

0f  [INSERT]  <element  definition  sentence)  >n 

<element  definition  sentence)  ::= 
fattribute  declaration) 

!  <relation  declaration) 

:  <structure  declaration) 

fattribute  declaration)  = 

attribute-name  If  value-name  :  number  :  text-string  >1 
[comment ] . 

frelation  declaration) 

relation -name  [relation-optional- word] 

If  [ e 1 ement-type-n ame 3  element-name  [comment]  >n. 

fstructure  declaration)  = 

STRUCTURE  2f  <node>  }n  END  [comment]. 

<node )  : : = 

felement  node) 

:  fterminator) 

:  <and  node) 

:  for  node) 

:  <for-each  node) 

<e lement  node  >  : : z 

[element-type-name]  element-name  [comment] 

fterminator  >  : :  = 

TERMINATE  [comment] 

I  RETURN  [comment] 

<  and  node  >  : :  = 

DO  [comment]  fbranch) 

If  AND  <branch>  >n 

END 

<br  anch  >  : :  = 

If  <node>  >n 

for  node  >  : :  = 

IF  [comment]  fconditional  branch) 

0f  OR  fconditional  branch)  >n 
OTHERWISE  [fbranch)] 

END 
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< cond  i  t i on  a  1  branch>  :  :  = 

Cuns  i  gned - i nteger 3  <condition>  <branch> 

<for-each  node) 

FOR  EACH  C  FILE]  file-name  C RECORD] 

C SUCH  THAT  <condition>] 

DO  Ccomment] 

IT  CALPHA3  alpha-name  Ccomment] 

I  C SUBNET]  subnet-name  Ccomment] 

>1 

END 

<condition>  ::=  (<Boolean  expression)) 


Now,  following  the  definition  procedure  given  in  the  last 
section/  we  have  the  following  steps: 


1)  Construct  an  entity  for  each  ALPHA  and  SUBNET  and  STRUCTURE 
in  the  software  system's  requ irements . 


2)  Construct  relational  entities  for  each  <and  node)/ 
<terminator >/  <or  node)  and  <for-each  node). 


3)  Define  triples  <alpha-name/  I-set/  O-set)/  <subnet-name/ 
I-setz  o-set>  and  < structure-name /  I-set/  O-set)  for  each 
ALPHA,  SUBNET  and  STRUCTURE. 


4)  Define  pairs  <data  item  name.  Structure)  from  the 
definitions  for  DATA  items. 
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3.2. 1.3  Implementation  Of  The  Intr aohase  Model 


Since  the  intraphase  model  is  based  on  a  semantic 
definition  of  a  notation/  it  is  clear  that  construction  of  the 
model  should  proceed  from  semantic  analysis.  Semantic  analysis 
is  most  commonly  carried  out  by  a  compiler  (or  an  interpreter)/ 
in  conjunction  with  a  parser/  which  constructs  the  necessary 
syntax  constructs  with  which  semantic  rules  are  associated. 
This  is  the  approach  which  we  will  adopt  for  construction  of 
the  intraphase  model. 


To  implement  the  intraphase  model  for  a  particular 


notation/  the  first  step  must  be  to  develop  a 


for  that 


notation.  Obviously  this  is  only  possible  when  the  notation 
has  been  formally  defined.  If  no  formal  definition  is 
available/  then  one  must  be  defined/  or  the  parsing  process 
must  be  replaced  by  a  manual  inspection  process. 


When  we  have  developed  a  parser  to  recognize  the  notation 
in  question/  we  will  then  modify  it  to  produce  the  nodes  and 
arcs  of  our  model.  The  first  step  is  to  identify  those 
constructs  which  constitute  the  primitive  activities  of  the 
system  from  the  definition  of  the  notation.  Now  identify  the 
syntax  constructs  which  control  the  sequence  of  primitive 
activities  and  determine  their  effect  on  the  activity  sequence. 


Next 


identify  by  what  rules  these  activities  can  use  or  alter 
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the  objects  of  the  system.  Noui  modify  the  parser  in  the 
fo 1  lowing  manner : 

1)  When  the  construct  is  a  primitive  activity#  produce  a 
"primitive  activity"  node. 

2)  When  the  construct  is  a  program  object#  produce  a  "program 
object"  node. 

3)  When  the  construct  combines  several  activities#  produce  a 
"comb inator "  node  and  connect  it  to  the  primitive 
act i vi t i es . 

4)  When  the  construct  is  a  "callable"  entity#  produce  a 
"component"  node  and  connect  it  to  its  activities. 

5)  When  the  construct  refers  to  a  "callable"  entity#  produce  a 
"primitive  activity"  node#  but  connect  it  to  the 
"component"  node. 

3.2.2  Construct  ion  Of  The  Inter ph  ase  Model 

By  the  nature  of  the  interphase  model#  it  is  clear  that  it 
depends  on  the  notations  used  at  each  phase.  Nonetheless# 
certain  general  principles  provide  general  assistance  in  its 


construct  i  on . 
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The  form  of  an  interphase  model  is  simply  the  form  of  a 
graph  rewriting  system/  in  which  the  previous  intraphase  model 
plays  the  part  of  the  initial  (axiom)  graph.  For  each 
rewriting  rule/  the  left-hand  side  must  include  nodes  and  arcs 
which  are  part  of  the  axiom  graph/  while  no  right-hand  side  may 
include  any  node  or  arc  from  the  axiom  graph  except  that  if  the 
same  node  or  arc  also  appears  in  the  left-hand  side  of  the 
rule. 


The  interphase  model  should  be  constructed  by  the  software 
development  team/  and  must  be  updated  by  software  maintenance 
personnel  who  modify  the  system.  In  the  event  that  no  such 
model  exists  to  support  the  software  maintenance  personnel/ 
they  must  construct  it  from  the  existing  documentation  of  the 
system.  Heninger  CHENI793  has  described  a  successful 
maintenance  project  in  which  the  software  requirements  for  a 
complex  flight  control  system  were  constructed  by  examination 
of  existing  documentation  and  discussion  with  users  and 
developers  of  the  system.  The  interphase  model  can  be 
constructed  by  following  that  procedure  and  recording  the 
relationships  identified  between  the  requirements  derived  and 


the  code  being  examined. 
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Of  course  the  simplest  way  to  define  graph  rewriting  rules 
to  replace  a  graph  G  by  another  graph  H  is  simply  to  use  a 
single  rule  G  ==>  H.  While  this  is  both  accurate  and 
permissible.  it  is  of  little  use  to  maintenance  personnel 
because  this  is  already  an  assumed  rule.  used  by  any 
maintenance  programmer  who  trusts  the  software  documents  from 
which  G  and  H  were  constructed. 

On  the  other  hand,  if  we  define  graph  rewriting  rules  so 
that  each  left-hand  side  has  exactly  one  node,  then  we  are 
placing  restrictions  on  the  developer  of  H.  since  certain 
permissible  graph  structures  cannot  be  constructed  using  this 
restriction  CJANSB0].  although  we  are  greatly  assisting  the 
maintenance  programmer  in  performing  tracing  throughout  the 
system.  This  phenomenon  is  further  extended  if  we  permit  the 
node  on  the  left-hand  side  to  have  only  a  single  representative 
node  on  the  right-hand  side,  by  restricting  the  possible  growth 
of  node  interconnections. 

While  it  may  appear  to  be  undesirable  to  restrict  the 
range  of  possible  solutions  available  to  the  developer,  a 
discipline  in  the  use  of  development  processes  does  assist  the 
maintenance  programmer.  In  addition,  since  the  number  of  arcs 
in  the  graph  is  a  measure  of  the  degree  of  interconnection  of 
the  graph.  it  is  an  indicator  of  both  the  complexity  and  the 
stability  of  the  system.  Therefore,  given  McCabe's  measure  of 


cyclomatic  number  for  program  complexity  EMCCA763  and  our 
experience  with  the  effects  of  interconnect ivity  on  program  and 
design  stability  CYAU80e#  S2c3,  any  development  process  which 
raises  this  degree  of  interconnectedness  must  be  considered  a 
source  of  increasing  system  complexity  and  instability. 

3. 2. 2. 1  Definition  Of  The  Interph  ase  Model 

In  order  to  construct  an  interphase  model#  it  is  necessary 
that  two  phase  models  already  exist.  Me  will  refer  to  these  as 
the  source  and  target  intraphase  models#  and  we  will  say  that 
the  target  model  is  derived  from  the  source  model.  Having  thus 
defined  our  terminology#  we  now  state  the  definition  procedure. 

1)  For  each  node  in  the  source  model#  assign  it  to  the 
left-hand  side  of  one  rule. 

2)  For  each  rule#  assign  a  sub-graph  of  the  target  model  to 
its  right-hand  side. 

3)  For  each  node  in  the  left-hand  side  which  is  part  of  the 

interface  between  the  subgraph  and  the  complete  graph 

model#  indicate  to  the  user  any  arcs  by  which  it  may  be 
connected  to  the  rest  of  the  model.  According  to  the 

user's  response#  select  one  of  the  following  steps. 

3a)  If  the  arc  is  not  to  appear  in  the  next  phase#  the 

developer  must  enter  an  explanation  for  this  omission. 
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3b)  If  the  arc  is  to  be  represented  in  the  next  phase,  the 
developer  must  identify  the  node  or  nodes  of  the  right-hand 
side  which  "represent"  the  node  under  consideration.  Each 
such  node  on  the  right-hand  side  should  be  given  a  unique 
label.  The  node  on  the  left-hand  side  should  be  given  a 
list  of  labels,  made  up  of  all  of  the1  labels  which  were 
just  assigned  to  the  right-hand  side. 

3c)  If,  in  attempting  to  obey  the  instructions  in  the  previous 
step,  it  is  found  that  the  arc  is  not  represented  by  an  arc 
or  simple  set  of  arcs  during  the  next  phase,  then  we  should 
add  the  arc  to  the  left-hand  side,  decide  if  the  left-hand 
side  can  be  divided  into  simpler  subgraphs,  and  modify  the 
right-hand  side  in  a  corresponding  fashion. 


3. 2. 2.2  Imp lementat ion  Of  The  Interohase  Mode  1 

Since  the  interphase  model  is  merely  a  collection  of  graph 
rewriting  rules  describing  the  process  of  deriving  one 
intraphase  model  from  another,  implementation  of  the  interphase 
model  should  consist  merely  of  recording  the  development 
process  as  it  is  carried  out.  To  do  this,  we  will  require  that 
the  development  team  uses  the  discipline  of  recording  the  fact 
that  a  certain  portion  of  a  milestone  document  is  to  be 
replaced  by  a  certain  portion  of  a  later  milestone  document. 
It  is  then  necessary  for  a  software  tool  to  translate  the 
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portions  of  the  two  documents  into  subgraphs  of  the  interphase 
models  of  these  two  documents.  This  requires  an  ability  to 
recognize  the  relationship  between  a  software  document  and  its 
model*  but  this  recognition  is  achieved  via  the  tools  which 
implement  the  intraphase  model.  Ue  have  already  demonstrated 
an  implementation  of  this  concept  at  the  program  code  level*  in 
which  the  text  of  the  program  and  its  internal  representation 
are  kept  in  step  by  means  of  a  syntax-directed  editor  and  an 
interactive  prettypr  i  nter .  By  analogy  with  that  system*  to 
record  the  fact  that  two  subgraphs  may  be  used  to  form  one  rule 
of  the  interphase  model  demands  that  we  select  the  portions  of 
the  documents  which  correspond  to  each  subgraph*  then  extract 
the  portions  of  the  internal  representations  which  have  been 
selected*  and  finally  record  the  results  as  a  part  of  the 
interphase  model. 


A  Techniaue  For  Soe 


Software  Modification  Pro 


In  this  section  we  will  describe  how  to  identify  all  items 
of  a  software  system  which  may  need  to  be  changed  as  a  result 
of  a  change  request.  At  first  we  will  describe  how  these  items 
may  be  identified  within  the  description  of  a  particular  phase 
("intrapase  tracing")*  then  we  will  describe  how  these  items 
may  be  identified  within  the  description  of  other  phases 


("interphase  tracing").  A  software  modification  proposal 
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consists  of  a  list  of  all  items  which  need  to  be  changed,  and  a 
description,  prepared  by  the  maintenance  programmer.  of  the 
change  which  must  be  made  to  each  item. 

3.3.1  Intraphase  Tracing 

The  model  of  the  system  at  each  phase  describes  the 
control  flow.  data  flow  and  data  structures  of  the  system 
during  that  phase.  We  are  interested  in  tracing  the  effects  of 
changes  made  to  this  phase  of  the  system  on  other  portions  of 
the  system.  The  problems  are  largely  identical  to  those  which 
we  have  already  worked  on  for  program  modification,  whose 
solutions  we  have  called  "logical  ripple  effect  analysis" 
C YAU80b ]  and  "performance  ripple  effect  analysis"  CYAU80C, 
80f ] .  For  that  reason.  we  may  use  substantially  the  same 
approach  for  tracing  the  effects  of  changes  during  other 
phases.  In  fact,  the  problems  of  performing  ripple  effect 
analysis  at  the  program  level  are  reduced  during  other  phases 
since  the  complex  problems  of  aliasing  and  recursion  are  less 
likely  to  arise.  In  addition,  the  likely  reduction  in  the  size 
of  the  model  to  be  traced  makes  ripple  effect  analysis 
techniques  even  more  attractive.  In  developing  our  measure  for 
design  stability,  we  have  already  discussed  the  use  of  ripple 
effect  analysis  techniques  at  the  design  level.  Hence,  it  wi i 1 
not  be  difficult  to  use  that  technique  to  detect  potential 
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ripple  effects  for  that  phase.  In  addition/  since  the 
intraphase  model  has  a  similar  structure#  independent  of  any 
particular  phase#  the  approach  mill  also  be  independent  of  the 
phase  at  which  it  is  applied. 

In  all  of  our  previous  work  on  analyzing  the  effects  of 
program  modifications,  it  has  always  been  the  case  that  we  have 
restricted  our  definition  of  logical  ripple  effects  to  those 
potential  changes  in  program  behavior  which  may  result  from  a 
change  in  the  values  of  data  items  in  the  program.  We  have 
been  able  to  identify  which  values  may  change  by  performing 
logical  ripple  effect  analysis.  Our  later  work  on  realizing 
program  modifications  helped  us  to  identify  another  different# 
though  relatively  minor#  type  of  ripple  effect  -  the  effects  on 
the  syntactic  correctness  of  the  program  being  modified.  While 
those  effects  would  be  detected  by  a  compiler#  the  ability  of 
our  program  editor  to  detect  them  at  the  time  the  modification 
is  being  made  is  of  great  value  to  the  maintenance  programmer. 
For  example#  the  ripple  effects  of  a  modification  may  lead  to 
undeclared  identifiers#  bacause  their  declaration  has  been 
deleted.  This  kind  of  effects  is  to  be  handled  by  the 
syntax-directed  editor  which  will  be  discussed  in  section  4.4. 
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This  raises  the  question/  then*  whether  there  are  other 
ripple  effects  of  program  modification  which  we  are  not  yet 
able  to  detect.  If  so#  how  can  we  extend  our  approach  to  cower 
all  of  these  effects?  Furthermore#  how  can  we  be  certain  that 
no  other  types  of  modification  effect  can  exist'5’  Our  response 
to  these  questions  has  been  the  development  of  a  semantic  model 
with  the  intention  of  modelling  all  of  the  semantic 
properties  of  the  system.  By  comparing  our  semantic  model  for 
a  particular  notation  with  the  standard  semantic  definition  for 
that  notation.  we  can  at  least  determine  the  completeness  of 
our  model  for  that  notation.  Thus#  if  no  semantic  changes  must 
be  made  to  our  model  of  a  system  in  this  notation#  we  can 
deduce  that  no  semantic  changes  will  occur  in  the  system. 
Since  the  "semantics”  of  a  system  is  synonymous  with  its 
"logical  behavior"#  it  follows  that  no  other  logical  ripple 
effects  may  occur.  It  is  not  so  clear  however#  that  no  other 
performance  ripple  effects  may  occur.  In  addition#  the  proof 
of  completeness  must  be  carried  out  independently  for  each 
notation  under  consideration#  and  the  notion  of  "completeness" 
must  be  understood  to  be  limited  by  the  completeness  of  the 
standard  semantic  definition  of  the  notation  (for  example# 
certain  decisions  may  be  left  to  the  implementors  of  the 


notation). 
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However*  the  fact  that  intraphase  tracing  is  being 
performed  in  a  multi-phase  context  makes  it  more  probable  that 
the  results  of  tracing  within  a  phase  are  reflections  of 
similar  tracing  results  at  a  previous  phase.  Therefore/  these 
effects  should  be  anticipated.  Other  results  of  the  tracing 
phase  will  not  have  been  anticipated;  these  are  the  effects 
due  to  the  approach  used  in  the  implementation  of  this  level. 
While  it  is  clear  that  certain  effects  at  one  phase  follow 
inevitably  from  changing  a  previous  phase*  these  other  effects 
appear  to  be  less  desirable  -  since  they  are  not  a  consequence 
of  the  problem*  but  a  consequence  of  the  development  process. 
These  effects  reduce  the  maintainability  of  the  system  as  a 
whole  and  require  the  maintenance  programmer  to  study  more  of 
the  system  before  modifying  it. 


3.3. 1.1  An  Ex  amp  1 e  Of  RSL  Mod  if icat ion 


This  example  shows  an  RSL  R_Net  being  modified.  The  R_Net 
is  shown  in  Figure  3.15*  and  its  MODEL  representation  is  shown 
in  F igure  3.16. 


Now  let  us  change  ALPHA  A1  to  be 

ALPHA:  Al. 

inputs:  DATA:  D1 
DATA:  D4 . 

OUTPUTS:  DATA:  D2 


111 
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R_NET:  RN001. 

STRUCTURE: 

INPUT__INTERFftCE:  II 
ALPHA:  A1 
DO  ALPHA:  A2 
AND  ALPHA*.  A3 
ALPHA:  A4 

END 

ALPHA:  AS 

OUTPUT_INTERFACE:  01 
END. 

ALPHA:  Al. 

INPUTS:  DATA:  Di. 
OUTPUTS:  DATA:  D2 
data:  D3. 

alpha:  A2. 

INPUTS:  DATA:  D2. 
OUTPUTS:  DATA:  D4. 
ALPHA:  A3. 

INPUTS:  DATA:  D3. 
OUTPUTS:  data:  D4. 
ALPHA:  A4 . 

INPUTS:  DATA:  D4 . 
OUTPUTS:  DATA:  D4. 
ALPHA:  AS. 

INPUTS:  DATA:  D4. 

outputs:  data:  D5. 


Figure  3.15.  RSL  R_Net  and  associated  alphas. 


Then  intraphase  analysis  should  implicate  activity  Al  and  data 
items  -c  D2 »  D3»  D4>  .  It  may  also  be  necessary  to  implicate  the 
activities  which  provide  the  value  of  D4 .  These  should  then  go 
on  to  implicate  additional  elements  which  use  these  implicated 


elements 


as  given  in  the  data  flow  triples. 


Wrong 


✓Fa  i  1 


P 


Consequences 
good  and  bad 


of  possible  c 
code  with  goo 
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Even  here/  some  interpretation  of  the  terms  "OK"  and 
"Wrong"  is  necessary.  Thus/  an  assertion  may  be  passed  by  some 
segment  of  code/  and  the  assertion  may  be  guaranteed  by  the 
code/  although  the  assertion  which  was  needed  to  ensure  the 
correctness  of  the  program  should  have  been  stronger/  and  is 
not  guaranteed  by  the  code.  We  would  have  to  consider  such  an 
assertion  to  be  wrong  because  it  does  not  accurately  state  what 
was  required  by  that  segment  of  code. 

3.3.2  Interphase  Tracing 

We  first  illustrate  our  approach  to  interphase  tracing 
with  an  abstract  example/  we  then  go  on  to  define  the  procedure 
to  be  used  for  each  type  of  "primitive"  modification  activity/ 
and  finally  illustrate  this  with  an  example. 

3. 3. 2. 1  An.  Ex  amp  1  e  Of  Inter  ph  ase  Tr  ac  ina 

We  will  show  how  the  effects  of  changes  made  to  the 
software  system  shown  in  Figure  3.18  using  the  tracing  rules  of 
Figure  3.19  can  be  traced  to  the  next  level  of  system 
decomposition/  shown  in  Figure  3.20.  Note  that/  in  the  set  of 
rules  given  in  Figure  3.19/  each  node  of  the  system  appears  on 
the  left-hand  side  of  exactly  one  rule.  We  follow  this  set  of 
rules  throughout  our  approach/  since  other  rules  involve  more 
complexity  both  for  design  and  tracing  of  the  software  system. 


.  ■-  *,  1.%  .v 


.  -\v 

Av. 


Figure  3.19.  Tracing  rules  between  two  phases  of 
an  abstract  software  system. 


Figure  3.20.  The  next  phase  of  the  abstract  software  system. 


Nou)/  uie  would  like  to  consider  the  effects  of 
modifications  to  the  original  structure  of  the  system. 

3.3.2. 1.1  Simple  Modification 

Consider  to  modify  the  node  labelled  C  shown  in  Figure 
3.10.  This  node  seems  to  be  fairly  localized  in  the  original 
graph  and  hence  its  ripple  effect  should  not  be  too  large.  Let 
us  show  the  detailed  steps. 
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1.  Local  ripple  effect  analysis  mould  require  us  to  examine 
the  nodes  B  and  E  (primarily  E)  because  they  are  directly 
connected  to  node  C.  Let  us  assume  that  B  and  E  need  not 
be  changed. 

2.  The  tracing  rules  in  Figure  3.19  shorn  that  the  node 

labelled  C  appears  in  R4,  being  traced  to  a  subgraph  on  the 
right-hand  side  of  rule  R4. 

3.  Now,  uie  must  determine  rnhat  parts  of  the  right-hand  side  of 

R4  must  be  changed.  Let  us  assume  that  me  decide  to 
replace  it  by  the  alternate  right-hand  side  of  R4,  mhich  is 
shomn  in  Figure  3.21,  mhich  me  may  consider  to  be  the 
addition  of  a  nem  feature  (Y)  together  rnith  the 

modification  of  an  existing  feature  (U).  Norn*  me  must  do 

some  local  ripple  effect  analysis  of  this  nem  right-hand 
side  of  R4,  as  a  result  of  the  insertion  of  the  node 
labelled  Y  and  the  modification  of  the  node  labelled  U  to 
U'.  However,  me  anticipate  that  the  right-hand  side  of  a 
rule  should  be  small  enough  to  do  the  analysis  thoroughly, 
perhaps  even  by  hand. 

4.  Norn,  no  further  ripple  effect  analysis  of  the  modified 
remritten  software  system  is  needed,  since  the  node 
labelled  X  is  the  only  embedding  item  in  the  rule.  Hence, 


if  local  ripple  effect  analysis  has  been  done  on  node  X  (in 
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Figure  3.21.  A  new  right-hand  side  for  rule  R4. 


Step  3/  the  last  step)*  no  further  ripple  effects  can 
occur . 

5.  Trace  the  effects  to  the  next  level. 

-  The  modification  to  the  node  labelled  U  can  be  handled  in 
this  same  way  like  the  modification  to  the  node  labelled  C 
at  the  previous  level. 

-  The  insertion  of  the  node  labelled  Y  must  be  handled 
differently*  since  one  of  the  reasons  is  that  there  is  no 
rule  established  for  the  new  node  (Y). 


3.3.2. 1.2  Insert  ions 

This  procedure  should  be  followed  for  newly  inserted  nodes. 

1.  For  each  neighbor  of  the  new  node*  determine  if  its  rule 
(as  before*  we  are  assuming  that  there  is  only  one  rule  for 
each  node)  should  include  the  new  node.  If  the  node  should 
be  included*  modify  the  left-hand  side  of  the  rule*  and 
proceed  as  before  for  modifications  to  nodes  (as  for  B  and 


U  before).  Otherwise*  go  to  Step  2. 

2.  Define  a  new  tracing  rule*  whose  left-hand  side  is  the  new 
node*  and  whose  right-hand  side  is  a  refinement  of  the 
semantic  definition  of  the  new  node. 

3.  Determine  how  the  right-hand  side  should  be  fitted  into  the 
new  level.  (The  embedding  problem). 

4.  Perform  ripple  effect  analysis  at  the  new  level*  to  make 

sure  that  the  new  right-hand  side  "fits".  Make  new 

modifications  as  required. 

5.  Repeat  the  process  for  inserted  and  modified  nodes  at  the 
next  level. 

3. 3. 2. 1.3  Deletions 

This  procedure  should  be  followed  for  nodes  which  are  to  be 

deleted.  We  will  use  the  same  example*  with  a  new  substitution 

for  R4  to  illustrate  this  procedure. 

1.  Let  us  assume  that  the  alternate  right-hand  side  of  R4  is 
instead  shown  in  Figure  3.22.  Again*  we  should  do  some 
local  ripple  effect  analysis  of  this  graph.  In  this  case* 
there  is  not  much  to  inspect. 


Figure  3.22.  An  alternative  right-hand  side  for  rule  R4. 

2.  In  addition  (as  before)*  since  the  node  labelled  x>  the 
gluing  item*  is  not  affected*  no  further  ripple  effect 
analysis  can  occur. 

3.  Trace  the  effects  to  the  next  level. 

Now*  at  the  next  level*  uie  must  first  deal  with  the 
tracing  rule  involving  the  deleted  node,  labelled  U.  If  the 
rule  is  a  node  replacement  rule*  then  that  rule  may  he  deleted. 
Alternatively,  since  that  rule  will  no  longer  be  applicable*  it 
may  be  left  alone  —  for  “garbage  collection".  At  this  point* 
our  strategy  depends  on  whether  we  store  the  descriptions 
statically  or  store  the  tracing  rules  and  allow  the 
descriptions  to  be  generated  dynamically  (the  standard 
space/time  tradeoff). 

3 . 4  Discussion  And  Futur e  Mor  k 

The  results  presented  here  deal  with  problems  which  have 
been  largely  ignored  in  the  area  of  software  engineering  in 
general*  and  the  area  of  software  maintenance  in  particular. 
Yet*  it  is  clear  that  we  are  dealing  with  problems  which  must 


be  resolved  by  software  maintenance  personnel  if  they  would 
improve  their  productivity.  This  work  is  most  closely  related 
to  that  of  software  configuration  management/  in  which  the 
ability  to  trace  software  elements  between  different  phases  is 
emphasized/  with  the  aim  of  improving  the  quality  of  a 
delivered  software  product.  Our  results  represent  a 
considerable  improvement  over  software  configuration  management 
approaches/  since  we  trace  not  only  software  elements/  but  also 
their  interrelationships.  This  is  particularly  valuable  to 
software  maintenance  personnel/  who  must  eliminate  all 
undesired  side-effects  of  their  modification  activity. 

The  value  of  these  results  is  limited  by  the  absence  of 
practical  experience  in  using  the  approach  with  any  software 
system.  The  effort  needed  to  implement  tools  to  support  this 
approach  would  undoubtedly  be  considerable/  even  if  these  tools 
were  restricted  to  particular  well-defined  notations  for 
requ i rements /  design  and  coding.  In  addition/  we  have  not 
dealt  with  the  question  of  the  adequacy  of  representing  only 
the  control  flow/  data  flow  and  data  structures  of  a  software 
system.  Our  model  is  a  semantic  model  for  software  systems/ 
most  closely  related  to  operational  semantic  definitions. 
Existing  operational  approaches  have  been  used  for  software 
requirements  CZAUE01/  023/  design  CHAY743  and  programming 
language  definition  CLEE723/  LPAGA013.  Hence/  the  success 
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ach i eved 

i  n 

these  areas 

suggests 

that  our 

approach  is 

sufficient. 

While 

some 

an  a  1  y  s  i  s 

of  the  approach  remains 

to  be 

performed. 

new 

quest  ions 

have  been 

raised  by 

the 

results 

a  1  re  ady  obtained 

First  of 

all,  it  is 

clear  that 

the 

approach 

has  implications  for  the  software  development  process. 
Currently,  it  is  not  customary  for  developers  to  record  any 
information  regarding  the  process  of  refining  a  system  between 
different  phases,  although  this  information  is  clearly 
available.  Using  our  interphase  model  this  refinement  process 
can  be  recorded,  so  that  the  maintenance  personnel  can  make  use 
of  it.  This  leads  us  to  ask  if  this  information  can  be 
automatically  extracted  using  other  software  tools  in  a 
software  engineering  environment.  In  addition,  the  realization 
that  the  refinement  process  will  become  a  part  of  the  system 
document  a o i on  should  encourage  software  developers  to  consider 
how  this  process  should  be  carried  out.  It  is  clear  that  the 
refinement  process  affects  the  quality  of  the  final  software 
system.  Bowles  CB0WL83I  has  shown  that  the  complexity  of  a 
software  design  may  be  used  to  predict  the  complexity  of  a 
program  developed  from  that  design,  under  certain  assumptions 
about  the  refinement  process.  His  results  may  be  considered 
together  with  our  work  to  study  the  effects  of  different 


processes  and  the  degree  to  which  they  permit  additional 
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4.0  REALIZATION  OF  SOFTWARE  MAINTENANCE  PROPOSALS 


Real 

izing  a  program 

modification 

proposal 

can  be 

an 

expens i ve 

and  unrel 

i  ab  1  e 

process.  We 

have  developed 

an 

approach 

to  program 

modifications  more 

quickly 

and 

more 

accurately.  Our  approach  uses  a  syntax-directed  editor  which 
operates  on  a  formal  model  of  the  program.  Using  this  editor 
ensures  that  modifications  will  always  leave  the  program  in  a 
syntactically  correct  state.  If  a  modification  results  in  a 
syntactic  inconsistency*  this  editor  will  advise  the  programmer 
of  that  fact  and  indicates  where  further  modifications  would  be 
needed  . 

As  an  aid  to  the  maintenance  programmer*  our  approach  will 
also  use  a  program  slicer  CUEIS81*  B23  in  conjunction  with  the 
program  editor  to  display  those  sections  of  the  program  which 
may  affect  the  program  code  under  investigation. 

4 . 1  Over v  i  e w 

The  overall  procedure  for  our  approach  to  this  incremental 
process  of  program  modification  is  shown  in  Figure  4.1.  We 
assume  that  the  programmer  has  made  a  preliminay  decision  as 
what  types  of  changes  must  be  made*  based  on  a  given 
modification  request.  Examples  of  the  type  of  information 
which  the  programmer  should  have  are  the  particular  functions 
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SI i cer  and  Ed i tor 


Ed i tor 


i 


•i 
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Figure  4.1.  The  procedure  for  incremental  program 
mod  if icat ion 


to  be  changed/  the  data  ualues  which  are  in  error/  or  the 
additional  functions  required  of  the  program.  Our  approach  for 
the  programmer  to  make  the  modification  can  be  summerized  as 


foil  oujs  : 
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(1)  Based  on  the  information  obtained  during  the  preliminary 
analysis  of  the  proposed  change/  locate  the  program  modules 
to  which  modifications  must  be  made. 

(2)  Use  an  interactive  "program  slicer"  to  identify  the  portion 
of  the  program  which  directly  affects  the  program  code  and 
data  values  to  be  changed. 

(3)  Decide  what  changes  must  be  made  to  the  code  selected  by 
the  program  slicer. 

(4)  Use  a  syntax-directed  editor  to  make  the  modifications  to 
the  program  code.  This  editor  will  guarantee  that  the 
changes  preserve  the  syntactic  correctness. 

In  order  to  support  this  approach/  we  have  developed  a 
system  which  incorporates  two  major  software  tools:  the 
program  slicer  and  the  syntax-directed  editor.  Figure  4.2 
shows  the  organization  of  this  system.  The  editor  consists  of 


three  basic 

modu 1 es : 

an 

i nter  act i ve 

pretty-pr inter 

for 

displaying 

the  status 

of 

the  program 

being  modified/ 

an 

incremental 

analyzer 

for 

analyzing  the 

legitimacy  of 

the 

modifications  being  made  to  the  program  and  for  updating  data 
flow  information/  and  a  recursive-descent  parser  for  parsing 
user-supplied  textual  information.  A  small  routine/  the 


manager"/  is  created  for  supervising  the  control  flow  of  the 


system . 
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Upon  receiving  slicing  commands  from  the  user,  the 
manager  invokes  the  program  slicer.  Upon  receiving  editing 
commands  from  the  user*  the  program  editor  mill  be  invoked.  A 
number  of  utility  programs  have  been  attached  to  the  system:  a 
converter  for  converting  existing  programs  to  our  program 
representation  (i.e.  program  model)*  a  dumper  for  sequentially 
listing  the  nodes  contained  in  the  representation*  and  a  batch 
pretty-printer  for  producing  a  me  1 1 - i ndented  source  code 
listing. 

4.2  The  Program  Representation 

Most  existing  syntax-directed  editing  environments  store 
the  syntactic  information  of  programs  in  the  form  of  abstract 
syntax  trees.  Depending  on  the  level  of  abstraction*  there  may 
exist  a  variety  of  abstract  syntax  trees.  The  hierarchical 
structure  of  a  program  is  thus  represented  by  the  syntax  tree. 
We  feel  that  a  program  representation  to  be  used  in  an 
interactive  syntax-directed  programming  environment  must  meet 
the  follouiing  criteria: 

1)  The  representation  must  be  formally  defined*  based  on  a 


formal  specification. 
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2)  The  representation  must  he  constructed  without  losing  any 
of  the  syntactic  information  contained  in  the  program. 


3)  The  representat ion  must  present  all  features  of  the 

language  in  a  uniform  manner,  so  that  a  variety  of  tools 
can  be  easily  integrated  CWASS823. 

4)  The  representation  must  support  incremental  program 

modification.  That  is,  whenever  a  modification  is  made  to 
a  program,  only  part  of  the  program  needs  to  be  updated  and 
re-analyzed . 


We  have  developed  a  tree-like  representation  for  programs, 
which  is  based  on  the  BNF  notation  frequently  used  for  formally 
describing  particular  programming  languages,  and  resembles  the 
parse  tree  used  by  compilers.  Our  tree  representation  consists 
of  a  well-defined  set  of  node  tupes ,  each  of  which  corresponds 
to  a  syntactic  construct  of  the  language.  Definition  of  the 
representation  for  a  particular  programming  language  can  be 
done  using  a  procedure  which  operates  on  an  annotated  BNF 
description  of  the  language. 


4.2.1  Data  Flow  Extens ions  To  The  Basic  Represent  at i on 


In  addition  to  recording  the  abstract  syntax  and  static 
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semantics  of  a  program,  the  program  representation  has  been 
extended  to  include  some  data  flout  information.  This  data  flout 
information  takes  the  form  of  tuio  attributes  which  list, 
respectively,  the  set  of  variables  whose  values  may  be  used  in 
that  statement  and  the  set  of  variables  whose  values  may  be 
def  ined  by  that  statement.  This  data  flow  information  can  be 
constructed  from  the  BNF  notation  for  a  simple  PASCAL- like 
language  as  shown  below,  where  the  used  variables  are  referred 
to  by  the  attribute  I  (denoting  input)  and  the  defined 
variables  are  referred  to  by  the  attribute  0  (denoting  output). 

(1)  <block>  ::=  begin  <statements>  end 

Here,  a  block  can  be  the  program  main  routine,  a  procedure 
or  a  function  body. 

<block>.I  =  (statements  >. I  -  <  xlx  is  a  local  variable  > 
<block>.0  =  (statements >.0  -  C  x!x  is  a  local  variable  > 

(2)  (statements)  <statement> 

(statements >. I  =  ( st atement > . I 
(statements >. 0  =  (statement >. 0 

(3)  ( st atements > '  ( st atements >"  ;  (statement> 

(statements >'.  I  =  ( st atements >". I  +  ( st atement >. I 
( st atements >'. 0  =  ( st atements >". 0  +  ( st atement >. 0 

(4)  (statement>  (assignment>  :  (procedure  statement>  : 

(for  statement>  :  (while  statement?  :  (repeat  statement?  : 
(if  statement?  :  (case  statement?  :  (compound  statement? 


In  this  case,  all  the  attributes  are  preserved. 


5)  (assignment)  ::=  id  :  =  (expression) 

< ass ignment >. I  n  <express i on  > .  I 
( ass i gnment ) . 0  =  <express ion >. o  +  £  id  > 


6)  (procedure  statement)  id  (  (actual  parameters)  ) 

In  this  case,  let  (block)  be  the  corresponding  procedure 
body,  then 

<procedure  statement). I  = 

£  x  ;  -for  each  element  y  in  (block). I,  if  y  is  a  formal 
parameter  then  x  is  used  in  the  corresponding  actual 
parameter,  otherwise,  x  =  y.  a  global  variable) 

<procedure  statement). 0  r 

£  x  I  for  each  element  y  in  <block>.0.  if  y  is  a  formal 
parameter  then  x  is  the  corresponding  actual  parameter, 
otherwise,  x  =  y,  a  global  variable  > 


7)  <for  statement)  for  id  :=  (expression)'  Ctoldownto) 

< express i on >"  do  <statement> 

<for  statement). I  r  (express  ion >'. I  +  ( express i on >". I  + 

<  st  atement ) . I 

<for  statement). 0  =  ( express i on >'. 0  +  <express ion >" . 0  + 

(statement >.0 


0)  (while  statement)  ::=  while  (expression)  do  (statement) 

(while  statement). I  =  (express i on >. I  +  ( st atement ). I 
(while  statement). 0  =  ( express i on >. 0  +  ( statement >. 0 


9)  (repeat  statement)  ::=  repeat  (statements)  until 
(expression) 

(repeat  statement). I  r  (statements >. I  +  ( express i on >. I 
(repeat  statement). 0  =  ( st atements > . 0  +  ( express  i  on >. 0 


(10)  (if  statement)  if  (expression)  then  (statement)'  C 

else  (statement)"  1 

(if  statement). I  =  (expression). I  +  ( st atement >'. I  +  C 
(statement  > " . I  3 

(if  statement). 0  =  ( express i on >. 0  +  ( s t atement >'. 0  +  C 
(statement)". 0  ] 
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(11)  Cease  statement)  ::=  case  (express  ion)  of  Ceases)  end 

Cease  statement). I  =  C express  i  on >. I  +  Ceases). I 
Cease  statement). o  =  Cexpress ion  ). 0  +  Ceases). 0 


(12)  Ceases)'  ::=  Cone  case)  E  ;  Ceases)"  3 

Ceases)'. I  =  Cone  case). I  E  +  Ceases)". I  3 
Ceases)'. 0  =  Cone  case>.0  E  +  Ceases)". 0  3 


(13)  Cone  case)  Cconstants)  :  Cstatement) 


Cone  case). I  =  Cstatement >. I 
Cone  case>.0  =  C st atement > .  0 


(14)  (compound  statement)  : : =  begin  (statements)  end 

(compound  statement). I  =  C st atements > . I 
(compound  statement). 0  =  (statements >. 0 


(15)  For  (expression),  if  it  does  not  involve  any  function 
call.  then  C express i on >. I  mill  be  the  set  of  variables 
used  in  the  expression  and  C express i on >. 0  mill  be  empty. 
If  a  function  call  is  involved,  then  the  equations  for 
(procedure  statement)  can  be  used  to  derive  data  flom 
information  for  that  function  call.  The  resulting  data 
flom  information  mill  be  the  union  of  these  tmo  parts. 


The  minimum  requirement  for  data  flom  analysis  is  that 
these  attributes  are  attached  to  the  nodes  denoting  conditional 
expressions  or  assignment  statements.  Homever.  the 
representation  described  here  is  able  to  greatly  reduce  tree 
traversal,  since  me  can  immediately  determine  if  a  structured 


statement  contains  any  references  to  a  particular  variable. 
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As  a  result  of  this  extension  to  the  basic  program 
representation,  a  corresponding  extension  has  been  made  to  the 
editor's  incremental  analysis  procedure,  in  order  to  keep  these 
data  flow  attributes  up-to-date  while  the  program  is  being 
modified.  Although  we  have  only  used  these  data  flow 
attributes  for  performing  program  slicing,  they  can  also  be 
used  for  data  flow  analysis  of  a  more  general  kind. 

4.2.2  The  Construct  ion  Of  The  Representation 

For  existing  programs,  a  comp  i  1  er- 1  i  ke  process  needs  to  be 
initiated  to  form  the  representation  by  generating  a  tree  node 
for  each  language  construct  as  soon  as  it  is  recognized  by  the 
parser.  This  process  should  present  no  problem,  since  existing 
programs  in  their  production  version  are  presumably  both 
syntactically  and  semantically  correct.  The  compiler  or  the 
interpreter  of  a  particular  programming  language  can  be 
modified  for  this  type  of  conversion.  This  conversion  is, 
however,  a  one-time  batch  process.  After  the  conversion  has 
been  carried  out,  the  program  representation  is  subject  to 
modification,  but  this  can  then  be  handled  by  a  syntax-directed 
editor.  The  editor  is  suitable  not  only  for  introducing  new 
code  into  existing  programs,  but  also  for  developing  new 


pr oqr  ams . 


4 . 3  The  Program  SI  icer 

4.3.1  The  Concept  Of  Pr oqr  am  SI  icing 

The  purpose  of  "slicing"  a  program  is  to  automatically 
extract  sections  of  the  program  which  are  closely  related  to 
each  other;  uiith  the  aim  of  providing  the  information  on  mhich 
the  programmer  wishes  to  concentrate  by  removing  those  sections 
of  the  program  which  are  not  considered  relevant  to  the 
modification  task. 

The  term  "program  slicing"  was  first  introduced  by  Weiser 
CWEIS813.  The  interrelationships  of  program  sections  in  a 
program  slice  were  restricted  to  those  which  can  be  detected  by 
data  flow  analysis.  We  also  follow  this  restricted  definition. 
8  program  slice  can  be  constructed  as  follows: 

1)  Locate  the  statement  in  the  program  at  which  program 
si  icing  shou  Ids  t . 

2)  Decide  which  variables  are  of  interest  to  the  programmer. 

3)  Use  data  flow  analysis  techniques  to  identify  all  of  the 
program  which  may  affect  the  values  of  the  selected 
var i ab 1 es . 

Thus,  the  input  to  the  program  slicer  consists  of  a  program.  a 


distinguished 


program  statement 


and  a  set  of  program 
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variables.  The  output  consists  of  a  set  of  statements  of  the 
program.  The  program  si  icer  itself  depends  on  data  flow 
analysis  techniques.  The  behavior  of  the  statements  selected 
by  the  program  si  icer  will  be  partially  equivalent  to  the 
behavior  of  the  original  program  with  respect  to  the  selected 
variables  and  initial  statement.  Under  the  assumption  that  no 
non-terminating  loop  exists  in  the  program/  the  behavior  of  the 
program  slice  and  that  of  the  original  program  with  respect  to 
the  selected  variables  and  initial  statement  are  totally 
equivalent  CWEISS1]. 

Weiser  CWEIS81/  82]  has  shown  that  slices  constructed  in 
this  way  were  recognized  by  subjects  who/  under  experimental 
conditions/  were  asked  to  perform  modifications  to  several 
programs.  This  result  indicates  that  the  subjects  had 
(mentally)  constructed  program  slices  relevant  to  the 
modifications  in  order  to  modify  the  programs.  However/  the 
program  slicer  was  not  available  for  use  by  the  subjects  of  the 
experiment.  Furthermore/  this  program  slicer  operated  on  a 
conventional  form  of  data  flow  graph  CHECH77]  (i.e.  a  directed 
graph  whose  nodes  represent  the  condition'-  and  assignment 
statements  of  the  program  and  whose  edges  represent  possible 
control  flow  paths  between  them).  Such  a  program  sheer 
produces  program  slices  with  incomplete  syntactic  information 


to  display  a  slice  as  a  syntactically  correct  program. 
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In  an  interactive  programming  environment/  the  slicer  must 
present  the  programmer  with  a  view  of  the  program  which 
corresponds  to  that  presented  by  other  tools  in  the  programming 
environment.  In  this  case#  and  in  normal  practice#  this  means 
that  the  text  of  the  code  in  the  slice  must  be  displayed. 
Although  tne  program  slice  has  been  defined  in  terms  of  data 
flow  analysis  and  the  selection  of  statements#  in  many 
programming  languages#  data  declarations  play  a  very  important 
role.  In  such  programs#  it  is  necessary  that  program  slices 
also  include  those  dec  1  ar  at  ions  which  declare  all  the  objects 
used  in  those  slices.  Our  program  slicer  meets  these 
requirements#  interactively  constructing  the  text  of  partial 
programs  which  are  made  up  of  the  subsets  of  declarations  and 
statements  of  the  original  program  which  satisfy  the  slicing 
criterion  and  form  a  legal  program.  To  achieve  this#  we 
extended  a  program  representation#  which  we  had  developed  to 
describe  both  the  syntax  and  semantics  of  programs#  to  include 
the  data  flow  information  needed  by  the  program  slicer.  Figure 
4.3  illustrates  the  program  slicing  technique  when  applied  to  a 
sma 1 1  progr  am . 

Our  current  approach  is  based  on  an  intramodu le  program 
slicer#  which  selects  that  portion  of  a  module  (i.e.  procedure 
or  function)  which  satisfies  the  slicing  criterion#  and 
includes  declarations  of  objects  inside  or  outside  the  module 
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tupe 


applerecord  = 

record  appletype:  (golden/  smith); 
rotten:  boolean;  order:  integer; 
cost:  dollars 
end  ; 


i /  count :  1  .  .  i 
average:  real; 
apple:  arr au  C l 


accum:  integer; 

201  of.  applerecord; 


count  : s  0;  .  .  . 
for  i  :  =  l  to  20  do 
with  appleCil  do. 

if  not  rotten  then 
begin  count  :=  count  +  l; 

accum  : =  accum  *  order 
end  ; 

if  count  >  0  then  average  :=  accum  ✓  count  ; 


iMfii 


tpplerecora  r 

record  rotten:  boolean;  order:  integer; 
end  ; 

L  •  •  • 

i#  count:  1  . .  20; 

apple:  arrau  Cl  . .  203  fif.  applerecord; 


count  : s  0;  .  .  . 
for  i  : =  1  to  20  do 
mith  appleC  i3  do. 

if  not  rotten  then  count  :=  count  +  l; 


Figure  4.3.  (a)  Portions  of  the  program  to  be  modified 

(b)  Portions  of  the  slice  constructed  for  the 
variable  COUNT. 
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which  are  necessary  to  ensure  that  the  slice  is  indeed  a 

i  syntactically  correct  program.  In  PASCAL  CJENS743  these 

objects  include  labels/  constants/  types/  variables/  procedures 
and  functions. 

I 

4.3.2  A 1 oor i thms  For  Sunt ax-D i rected  Program  SI ic ino 

|  To  perform  "syntax-directed"  program  slicing/  uie  have 

developed  the  following  algorithm/  which  operates  on  the  parse 
tree  of  the  program/  to  attach  data  flow  sets  to  each  statement 
and  expression  node  of  the  tree. 

! 

The  inputs  to  the  program  slicer  are  the  augmented  parse 
tree  of  the  program/  the  point  at  which  slicing  should  start 
(specified  by  the  current  position  of  the  "cursor"  within  the 
parse  tree)  and  a  set  of  variables  to  be  used  to  construct  the 
slice. 

The  behavior  of  the  algorithm  depends  on  the  particular 
statement  type#  based  on  the  possible  data  flow  paths  which  are 
permitted  by  the  semantic  definition  of  the  statement  type. 
The  basic  "generic"  statement  types  are  sequence /  selection  and 
iteration .  In  PASCAL-Sz  these  are  represented  respectively  by 
the  compound  statement  type/  the  rf  and  c ase  statement  types/ 
and  the  while/  repeat  and  for  statement  types.  Other  statement 
types  may  be  collectively  referred  to  as  assignment  statements. 


t 


2)  s 1 ice_se lect ion  (St:  statement  ; 

var  SU:  set  of  variable  names)  ; 

osu  :  =  su  ;  t  : =  osu  ; 

slice  ( 1 ast_uns 1 i ced_cho i ce_of  (St)*  T)  ; 

HSU  : =  T  J 

while  St  has  more  unsliced  choices  do. 

T  : =  OSU  ; 

slice  (youngest_uns 1 iced_cho ice_of  (St)/  SU)  ; 
NSU  : =  T  U  NSU 
end  uih  i  1  e  ; 

SU  :=  NSU  U  Expr. Inputs  ; 
inc lude_express  ion  ( Expr ) 

3)  s 1 ice_iterat ion  (St:  statement  ; 

var  SU:  set  of  variable  names)  ; 

OSU  : =  T  ;  T  : =  OSU  ; 
slice  (body_of  (St)/  SU)  ; 

NSU  :=  T  ;  T  :=  T  -  OSU  u  Expr. Inputs 
mh  i  1  e  T  4  #  <1° 

OSU  : r  OSU  U  T  J 

slice  (body_of  (St)  SU)  ; 

NSU  : r  NSU  U  T  J  T  : -  T  -  OSU 
end  while  ; 

SU  : =  NSU  J 

inc lude_express ion  (Expr) 

4)  s 1 ice_ass ignment  (St:  statement  ; 

var  SU:  set  of  variable  names)  ; 

SU  :=  SU  -  St. Outputs  J 
su  : =  su  u  st. inputs  ; 


The  slicing  algorithm  progresses  by  traversing  the  parse 
tree  in  an  order  which  visits  the  statement  nodes  which  precede 
the  initially  chosen  statement  node  (according  to  the  program's 
control  flow)  in  reverse  control  flow  order.  Hhen  structured 
statements  are  encountered/  they  are  considered  in  a  top-down 
order/  being  considered  only  while  their  output  data  set  (the 
variables  affected  by  that  statement)  overlaps  with  the  current 


set  of  "slice  variables". 


All  statements  chosen  by  the  slicing  algorithm  are 
"included"  in  the  resultant  slice  of  the  program.  To  ensure 
the  correctness  of  the  syntax  of  the  slice#  declarations  of  any 
objects  used  in  the  included  statement  must  also  be  included  in 
the  slice.  These  "objects"  include  named  constants  and 
variables#  and  their  associated  type  definitions#  as  required. 

In  the  case  of  our  i ntr amodu  1  e  slicer#  we  must  also  devise 
an  approach  to  deal  with  calls  to  other  modules.  When  such 
calls  are  included#  uie  have  adapted  the  convention  of  including 
an  empty  version  of  the  called  procedure  in  the  slice.  This 


vers  ion 

of  the 

procedure 

includes  its 

name#  type  ( if  any) 

and 

formal 

parameter  list# 

together  with 

an 

empty  declaration 

part 

and  an 

empty 

compound 

statement 

for 

its  body.  This 

is 

sufficient  to 

sat  isf y 

the  syntactic 

requirements  of 

the 

1 anguage . 

Since  block  structured  languages  such  as  PASCAL  permit 
access  to  "objects"  declared  at  any  one  of  several  upper 
levels#  uie  have  chosen  to  preserve  the  upper  levels  in  the  body 
of  the  slice.  Thus#  the  slice  will  include  "empty"  versions  of 
all  procedures  which  contain  the  module  being  sliced.  Within 
each  of  these  procedures  will  also  appear  the  declarations  of 
any  objects  which  were  previously  declared  in  that  procedure# 
and  which  are  needed  within  the  slice.  Clearly#  the  inclusion 
of  declarations  within  the  slice  is  important  for  helping 
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program  modification, 


The  nodes  which  are  included  uiithin  a  slice  form  a  subset 
of  the  nodes  in  the  parse  tree  of  the  program.  However*  they 
can  be  rejoined  to  form  another  parse  tree#  using  the  edges 
which  existed  in  the  original  tree  as  a  guide.  The  new  parse 
tree  constructed  in  this  way  is  used  to  display  the  text  of  the 
slice  identified  by  the  slicing  algorithm. 


However*  these  algorithms  are  not  sufficiently  general  to 
allow  the  programmer  to  select  statements  arbitrarily.  For 
instance*  if  the  programmer  selects  a  statement  from  the  center 
of  a  sequence  of  statements*  the  "s 1 ice_sequence"  procedure 
must  be  altered  to  start  slicing  from  the  selected  statement* 
instead  of  starting  from  the  last  statement  in  the  statement 
sequence.  To  handle  this  and  similar  cases  involving 
"s 1 ice_se lect ion"  and  "s  l  ice_i  terat ion"*  we  have  written 
modified  algorithms  to  perform  slicing  on  partial  parse  trees. 

First  of  all*  it  is  necessary  to  construct  a  list  L  of  all 
statements  which  enclose  the  selected  statement.  This  list  can 
be  constructed  in  a  straightf orward  manner  from  the  parse  tree* 
by  visiting  the  "parent"  of  each  node  until  the  body  of  the 
module  is  reached.  The  following  algorithm*  a  modified  form  of 


the  "slice"  procedure*  is  used: 
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procedure  part_slice  (St:  statement  ; 

L:  list  of  statements  ; 

var  SU:  set  of  variable  names)  ; 

i f  L  is  empty  then  slice  (St,  SU) 
e 1 s i f  SU  H  st. Output  ^  ^  then 
c ase  St. statement  type  o_f 

SEQUENCE:  s  1  i  ce_p  art_sequence  (St/  L/  SU)  ; 
SELECTION:  s  1  i  c  e__p  ar  t_s  election  (St/  L,  SU)  ; 
ITERATION:  slice  _p ar t_i t er at i on  (St,  L,  SU)  ; 
ASSIGNMENT:  s 1 ice_ass ignment  (St,  SU)  ; 
end  case 

inc 1 ude_st atement  (St) 
end  i  f  ; 

end  procedure 


As  an  illustration,  uie  shotu  the  modified  form  of  the 
procedure  "s 1 ice_sequence" .  Similar  modifications  must  be  done 
to  “slice  selection"  and  "slice  iteration". 


s 1 ice_part_sequence  (St:  statement  , 

L:  list  of  statements  ; 
var  SU:  set  of  variable  names)  ; 
part_slice  (head  (L),  tail  (L),  SU) 
while  St  has  more  unsliced  children 
preceding  head  (L)  do 

slice  ( youngest_uns 1 i ced_e lder_s ib 1 i ng_of  (head  (L)),  SU) 
end  uih  i  1  e 


The  initial  call  to  start  slicing  will  be: 
part_slice  (head  <L),  tail  (L),  SU). 
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4.3.3  Enhancements 

To  improve  the  usefulness  of  this  program  slicer  as  a 
programming  aid*  uie  have  added  the  options  of  further  applying 
the  slicer  to  existing  slices  of  a  program  to  obtain  a  more 
refined  picture  of  program  behavior  and  of  combining  slices 
(possibly  those  of  distinct  modules)  into  more  comprehens i ve 
units.  Me  have  defined  the  following  operations  for  combining 
program  slices  into  larger  units*  including  statements  taken 
from  several  modules: 

UNION:  Given  tuio  slices*  Si  and  S2*  construct  a  third 
slice  S3  which  contains  all  the  statements  and  declarations 
which  appear  in  either  SI  or  S2. 

INTERSECT:  Given  two  slices*  SI  and  S2*  construct  a  third 
Slice  S3  which  contains  all  the  statements  and  declarations 
which  are  common  to  both  Si  and  S2. 

By  the  definition  of  a  program  slice*  each  of  these 
operations  will  always  ensure  that  the  slice  S3  satisfies  the 
requirements  of  a  program  slice*  and  also  ensures  that  it  will 
be  syntactically  correct.  Since  a  program  slice  can  be 
considered  as  a  parse  tree*  or  even  as  a  set  of  nodes  taken 
from  a  parse  tree*  these  operations  are  readily  implemented 


using  well-known  algorithms  for  set  operations. 
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4.4  The  Suntax-directed  Editor 

We  can  make  the  following  observations  on  existing 
syntax-directed  editors:  They  are  designed  specifically  for 
program  development/  emphasize  the  creation  of  programs  in  a 
top-down  fashion/  are  based  on  the  abstract  parse  tree/ 
incorporate  an  incremental  semantic  evaluation  mechanism/  and 
are  highly  experimental  in  nature. 

One  major  contribution  made  by  existing  syntax-directed 
editors  is  that  a  program  is  treated  as  a  well-formed 
collection  of  sunt  act  i  c  units  (language  constructs)/  not  just 
text .  The  actions  carried  out  by  these  editors  can  be 
classified  as  sunt  act i c  editing  oper  at i ons  because  the 
syntactic  structure  of  the  program  will  be  affected  as  an 
immediate  result  of  these  operations.  The  programmer  using 
these  "syntactic"  editing  operations  should/  however/  expect 
"semantic"  effects  as  well.  Most  program  editors  do  perform 
semantic  checking/  which  is  enforced  in  conjunction  with  the 
syntactic  editing  operations. 

In  this  section  we  briefly  describe  a  new  type  of  program 
editor  which  also  supports  incremental  analysis  and  update 
using  a  tree  representation  of  programs/  and  displays  program 
text  using  a  screen-or i ented  pretty-pr inter .  The  editor/ 


however/  is  based  on  the  class  of  editing  operations  which  are 
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termed  semantic  editing  operations/  in  the  sense  that  not  only 
the  syntactic  structure  of  the  program  is  affected/  but  also 
each  of  these  operations  has  a  meaning  (semantics)  which  is 
defined  by  the  context  in  which  the  operation  is  performed. 

For  example/  suppose  that  the  cursor  is  positioned  over  a 
constant  definition.  The  programmer  can  add  a  new  constant 
definition/  appearing  after  the  current  one/  by  issuing  an 
insert  operation.  The  programmer  does  not  have  to  explicitly 
specify  the  intention  to  insert  a  new  "constant"  definit  n. 
Knowledge  of  the  immediate  semantic  effects  of  the  ed  <ng 
operation  is  therefore  shared  between  the  programmer  and  c 
system.  More  complicated  semantic  effects/  such  as 

multi-declarations/  are  still  subject  to  tracing  by  the  system 
alone . 

There  are  at  least  two  major  advantages  in  using  this  kind 
of  editing  operations: 

1.  Since  the  programmer  is  made  aware  of  the  structures  of  the 
programming  language/  modifications  are  performed  as 
operations  on  these  structures/  rather  than  as  operations 
on  a  piece  of  text.  We  believe  this  to  be  a  more  reliable 
and  informative  way  of  modifying  programs/  although  certain 


textual  operations  are  still  valuable. 
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2.  Less  information  needs  to  be  provided  by  the  programmer 
because  the  cursor  position  helps  the  editor  to  determine 
the  meaning  of  each  operation. 

We  have  defined  three  classes  of  commands*  basic 
mod i f  i  c  at i on  commands  *  cursor  movement  commands  >  and  extended 
modification  commands .  Programmer's  modifications  can  be 
translated  in  the  underlying  operations  for  each  command.  The 
programmer's  view  of  the  editing  operations*  however*  uses  a 
more  friendly  notation  than  the  commands  described  in  that 
paper. 

4.4.1  Incremental  Editing 

Very  often  one  may  prefer,  at  intermediate  stages  of 
program  editing,  some  syntactic  structures  of  a  program  to  be 
temporarily  incomplete.  Therefore-  the  concepts  of 
•templates"*  "placeholders"  and  "phrases",  as  described  in  the 
Cornell  system  CTEIT813*  are  also  used  in  our  system.  These 
concepts  are  illustrated  in  the  following  example: 

insert  a  “while"  statement  after  the  "for"  statement 

for  i  : :  1  to  20  do  appleCi]  :=  pieCiJ; 
while  < < cond i t i on > >  do  < < st atement > >; 


"while"  template 
(construct ) 


p 1 aceho lders 

(components  of  construct) 
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"Phrases’*/  which  we  call  "primitive  strings"/  are  subject 
to  parsing.  A  simple  "recursive  descent"  parser  is  included  in 
our  system  to  perform  this  limited  parsing.  The  process  is 
incremental  only  in  the  sense  that,  after  parsing,  the 
resulting  subtree  is  included  in  the  existing  program  tree. 


The  set  of  basic  modification  commands  is  suitable  for 
updating  programs  in  a  more  incremental  manner,  while  the  set 
of  extended  modification  commands  takes  advantage  of  the 
existing  program  constructs. 


4.4.2  Legitimate  Oper at i ons 


Not  all  types  of  editing  commands  can  be  applied  to  each 
language  construct.  For  example,  in  PASCAL,  the  DELETE 
operation  can  be  applied  to  the  "ELSE"  part  of  the  IF_THEN„ELSE 
construct  to  delete  the  keyword  ELSE  and  all  the  statements  of 
the  "ELSE”  body.  The  operation,  however,  may  not  be  applied  to 
the  "THEN"  part  of  the  IF_THEN_EL5E  construct.  Note  that  all 
the  statements  of  the  "THEN"  part  can  be  deleted  to  leave  an 
empty  "THEN"  part. 


We  have  defined  a  Legitimate  Oper  at  i  on  Table  which 
records,  for  each  language  construct,  the  type  of  semantic 


editing  operations  that  can  be  applied.  Figure  4.4  shows  part 
of  the  table  for  the  programming  language  PASCAL.  Whenever  the 
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programmer  specifies  an  operation  to  be  performed/  the  editor 
must  consult  the  table  to  determine  the  legitimacy  of  the 
intended  operation. 

4.4.3  Incremental  An  a  1  us  i  s 

The  major  function  of  incremental  analysis  is  to  perform 
incremental  evaluation  of  the  static  semantics  of  programs. 
According  to  the  char acter i st ics  of  the  operation/  the  current 
cursor  position  in  the  program  representation  and  the  new 
information  to  be  included  in  the  case  of  ADD/  INSERTA/  IHSERTB 
and  CHANGE  operations/  consistency  checks  of  the  static 
semantics  of  the  program  being  modified  must  be  made. 

For  each  entry  in  the  Legitimate  Operation  Table/  certain 
semantic  "hooks”  may  be  defined.  These  semantic  hooks  trigger 
the  invocation  of  related  semantic  checking  routines/  when  the 
entry  indicates  that  the  operation  is  legitimate.  For  example/ 
the  command  to  change  an  assignment  statement  "a  :=  b+c"  to 
"a  :=  b+d"  may  be  hooked  to  three  semantic  checking  routines: 

1.  Check  whether  the  variable  "d"  has  been  declared  or  not. 

2.  Check  whether  the  variable  "d"  can  be  used  as  an  operand  in 
this  statement/  according  to  its  type. 
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Oper at  ions 

ADD 

INSERTA 

INSERTB 

CHANGE 

DELETE 

Language 

Constructs 

CONST 

X 

0 

0 

0 

0 

VAR 

X 

0 

0 

0 

0 

PROCEDURE_CALL 

0 

0 

0 

X 

0 

ACTUAL_PARM 

X 

0 

0 

0 

0 

BEGIN_END 

0 

0 

0 

X 

0 

I F_THEN_ELSE 

0 

0 

0 

X 

0 

THEN 

0 

0 

X 

X 

X 

ELSE 

0 

X 

X 

X 

0 

WHILE 

0 

0 

0 

X 

0 

EXPRESSION 

X 

X 

X 

0 

X 

TEMPLATE 

X 

X 

X 

0 

X 

0  :  legitimate  operation 
X  :  illegitimate  operation 


Figure  4.4.  A  part  of  a  legitimate  operation  table. 
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3.  Perform  type  coercion. 

Since  temporary  semantic  inconsistency  at  intermediate 
stages  of  the  program  modification  activity  should  be 
tolerated#  the  language  constructs  involved  may  be  highlighted 
to  indicate  the  violation  until  it  is  removed.  Tor  example/  if 
a  variable  declaration  is  deleted/  a  list  of  usages  to  this 
variable  in  the  program  will  remain/  in  which  each  element 
represents  a  semantic  incons istancy  (i.e.  an  undeclared 
variable).  The  system  should  assist  the  programmer  in 
identifying  this  list  of  semantic  inconsistencies. 

Figure  4.5  shows  a  more  complex  case/  in  which  a  new 
variable  is  introduced  into  a  procedure  B  which  is  nested 
within  another  procedure  A.  The  original  variable  CURSOR  was 
declared  in  procedure  A#  and  used  in  both  of  the  procedures  A 
and  B.  If  a  new  variable  CURSOR  is  declared  in  procedure  B/ 
this  will  override  the  previous  declaration.  A  very  likely 
consequence  is  that  the  usages  in  procedure  B  of  the  original 
variable  CURSOR  will  also  become  invalid. 

This  may  be  because  the  attributes  of  the  two  variables 
are  totally  different.  Even  if  these  two  variables  have 
identical  attributes/  the  programmer's  intention  is  still 
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Procedure  A; 

Uar  CURSOR  :  integer; 

Procedure  B; 

.  .  .  - 

beo  in  <*  procedure  B  *) 

C  CURSOR  used  > 

end;  <*  procedure  B  *) 

beoin  <*  procedure  A  *> 

{  CURSOR  used  > 

end;  <*  procedure  A  *) 

Figure  4.5.  Insertion  of  a  local  variable. 


unknown.  If  these  two  variables  have  identical  attributes*  a 
compiler  must  take  into  account  the  new  declaration*  and  apply 
it  to  all  the  "usages"  of  the  variable  CURSOR  declared  in 
procedure  B.  However*  since  the  programmer  was  making 
modifications  to  an  existing  program*  he  might  not  be  aware  of 
the  existence  of  another  variable  of  the  same  name.  Compilers 
are  obviously  ineffective  in  detecting  this  kind  of 
("injected")  error. 


By  comparison*  a  highly  responsive  program  editor  can 
assist  programmers  in  detecting  them  at  the  earliest  possible 
stage.  Me  do*  therefore*  feel  that  it  is  the  responsibility  of 
the  editor  to  inform  the  programmer  about  such  dangers*  and  to 
require  the  programmer  to  resolve  the  ambiguity. 

4.4.4  Increment  a  1  Update  Of  Data  Flow  Information 

Once  a  modification  is  made  to  a  node  of  the  tree  model* 
the  data  flow  equations  are  used  to  update  the  data  flow 
information  for  that  node.  Since  the  data  flow  information  for 
most  nodes  is  derived  from  its  children*  changes  will  be 
propagated  to  the  ancestors  of  the  modified  node  as  far  as 
possible.  If  we  let  the  propagation  of  changes  proceed  each 
time  a  modification  is  made*  we  will  find  that  there  are  two 
immediate  disadvantages.  First*  this  propagation  for 
large-scale  software  systems  may  continue  for  a  long  time*  if 
the  next  modification  is  made  to  a  descendant  node  of  the 
current  node*  this  propagation  of  changes  is  not  only  wasteful* 
but  also  unnecessary. 

Since  the  cursor  movement  along  the  parse  tree  is 
continuous*  we  realize  that  updating  data  flow  information  for 
the  current  node  should  be  done  only  when  the  next  move  is  to  a 
sibling  node  or  to  the  parent  node.  This  scheme  reduces  the 
response  time  significantly  and  still  guarantees  that  data  flow 
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information  is  ready  whenever  the  subtree  rooted  at  the  current 
node  is  referenced.  Whenever  a  slicing  command  is  entered/ 
propagation  will  be  performed  until  the  propagation  reaches  the 
root  of  the  tree  or  stops  at  some  node  which  has  no  change  in 
its  data  flow  information. 

One  of  the  major  assumptions  of  the  above  scheme  is  that 
we  assume  that  only  the  nodes  which  lie  on  the  path  from  the 
root  of  the  current  block  to  the  current  node  have  incorrect 
data  flow  information.  Otherwise/  we  assume  that  the  data  flow 
information  of  descendant  nodes  and  sibling  nodes  is  correct  at 
any  instant/  even  when  procedure  statements  or  function  calls 
exist.  This  is  not  true  if  we  do  not  update  the  data  flow 
information  of  procedure  statements  and  function  calls  when  the 
data  flow  information  of  the  corresponding  block  changes.  we 
consider  these  changes  as  side  effects  which  are  created  when 
the  data  flow  information  of  a  block  is  changed.  In  this  case/ 
all  the  procedure  statements  or  function  calls  which  refer  to 
this  block  must  also  be  updated/  and  we  must  propagate  the 


v\"V- 


change  in  data  flow  information  as  far  as  possible. 
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4.4.5  Interactive  Prettu-pr i nt i ng 

The  function  of  the  screen-oriented  pretty-pr inter  is  to 
allow  the  programmer  to  view  the  portion  of  the  program  being 
edited.  The  programmer  first  uses  the  cursor  commands  to 
examine  the  program*  then  uses  the  editing  commands  to  modify 
the  program.  The  pretty-pr inter  responds  to  cursor  commands* 
and  rebuilds  the  screen  display  according  to  program  changes* 
by  examining  the  program  r epresent at i on .  As  a  result*  the 
pretty-printer  provides  instant  visual  feedback  to  assist  the 
programmer  in  perceiving  program  changes  in  an  interactive 
manner.  Figure  4.6  shows  the  various  cursor  positions 
resulting  from  a  sequence  of  cursor  movements. 


4. 5  Software  Development 

We  are  currently  completing  an  implementation  of  a 
prototype  version  of  the  system  shown  in  Figure  4.2.  The 
system  has  been  written  in  PASCAL  and  runs  on  our  UAX-ll/700 
computer.  Our  choice  for  the  first  target  programming  language 
is  PASCAL-S.  a  subset  of  PASCAL  CWIRT753.  The  program 
representation  is  implemented  as  a  set  of  fixed  length  PASCAL 
records*  each  of  which  corresponds  to  a  construct  in  the 
PASCAL-S  language. 


-*»  h 
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ocedure  sort; 

var  counter#  pointer#  temp  :  integer; 
beoin 

counter  : =  20; 
while  counter  >  1  do 
bea  in 

pointer  :  =  l; 

*1 

while  pointer  <  counter  do. 

*2  *3 

fe&aia 

*4 

if.  1  isttpointerj  <  1  istCpo inter  +  13  then 
*5 

beg  in 

temp  : =  1  istCpo inter 3 ; 

1  istCpo  inter  3  :s  1  istCpo inter  +  13 ; 

1  istCpo  inter  +  13 :=t*mp; 
end; 

pointer  :s  pointer  +  i; 

*6 

end; 

counter  :  =  counter  -  l; 
end; 

end; 


it  ion 

*1  to 

posit  ion 

*2 

down  n 

•  • 

*2 

•t 

*3 

RIGHT 

M 

*3 

M 

*4 

DOWN 

M 

*4 

•  • 

*5 

RIGHT 

ft 

*5 

•  e 

*6 

DOWN 

•  • 

*6 

•# 

*7 

DIAGONAL 

Figure  4.6.  An  example  to  show  a  sequence  of  cursor  movement. 
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To  convert  existing  PASCAL-S  programs  to  the  program 
represents  ion*  uie  have  modified  the  PASCAL-S  interpreter  by 
Mirth  CMIRT753  so  that  uie  can  use  the  syntax-directed  editor  to 
modify  the  program.  Of  course*  the  editor  can  also  be  used  for 
neui  program  development.  Our  implementation  of  the 
pretty-pr inter  has  been  enhanced  by  using  an  "extended  cursor" 
t TEIT01 I  to  highlight  an  entire  programming  language  construct. 
The  program  slicer  is  noui  operational  on  individual  modules. 
However*  using  the  operations  UNION  and  INTERSECTION  of  program 
slices  it  is  possible  to  construct  program  slices  using 
intermodule  data  flow. 

These  software  tools  (modules)  communicate  with  one 
another  through  updating  and  examining  the  value  of  the  current 
position  indicator  in  the  tree*  given  by  a  global  variable 
TREE_CURSOR .  Figure  4.7  shows  the  communication  pattern.  The 
utility  programs  described  in  Section  4.1  have  also  been 
implemented*  and  they  are  often  used  as  off-line  tools.  To 
provide  the  programmer  with  more  information  about  the  program 
and  the  status  of  the  modification*  we  use  a  mu  1 1 i -d i sp 1  ay 
system.  Two  CRT  terminals  are  used  simultaneously  one  for 
displaying  program  fragments  and  the  other  for  user  interface. 
This  will  allow  functions  such  as  issuing  commands*  entering 


character  strings  (primitive  information)  and  receiving  system 


messages 


Program  SI icer 
(cr  i  ter  ion  ) 


Syntax-directed  Editor 
(mod i f icat ion ) 


TREE  CURSOR 


ROOT 
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Pretty-pr inter 
( adjust ing 
display) 


K  X 

S1\  A 

A  A 


0  o 


0  0 


Figure  4.7.  The  commun i cat  ion  pattern  of  the  integrated  tools. 


By  integrating  these  tools  using  our  program 
representat  ion#  uie  have  provided  an  environment  in  which 


different  activities  involved  in  program  modification  may  be 


coordinated  and  treated  as  parts  of  a  single  task.  Our 
experience  based  on  testing  early  versions  of  these  modules 
(such  as  the  program  slicer  and  the  pretty-printer)  indicates 
that  our  approach  is  feasible.  Consistently  using  the  tree 
operations  defined  on  the  program  model  as  editing  operations 
has  been  shown  to  be  practical. 

4.6  Discussion  ftnd  Future  Work 

We  have  presented  an  approach  to  incremental  program 
modification  using  a  set  of  uie  1  1  - i ntegr ated  software  tools.  We 
have  also  presented  a  tree-like  program  representation  which 
contains  sufficient  information  about  the  program  structure  and 
static  semantics  with  data  flout  extension  to  facilitate  various 
analysis. 

In  order  to  use  this  approach,  the  following  improvements 
need  to  be  made: 

1.  Programmers  are  allowed  to  move  freely  to  any  spot  in  the 
program  by  means  of  the  structured  cursor  movement 
commands.  It  is  found  that  the  correspondence  between  the 
position  in  the  represent  at i on  and  the  user's  view  of  the 
position  in  the  text  is  troublesome,  and  that  different 
nodes  within  the  tree  representation  often  correspond  to 
the  same  piece  of  text.  This  would  confuse  the  user  as  to 


the  exact  location  of  the  cursor  in  the  program.  Our 
solution  to  this  problem  mill  be  to  refine  our 
implementation  so  that  movement  commands  automatically  skip 
certain  nodes  which  do  not  correspond  to  a  distinct 
construct  in  the  user's  view  of  the  text. 

2.  The  extended  cursor  provides  a  visual  cue  for  the 
programmer  by  clearly  highlighting  the  current  construct. 
Movement  commands  with  even  larger  spans  are  still  needed 
for  the  programmer's  convenience. 

3.  The  set  of  editing  commands  is  complete  in  the  sense  that 
it  allows  any  kind  of  modification.  However;  in  order  to 
achieve  greater  efficiency*  this  set  must  be  extended.  For 
example*  multiple  buffers  can  be  introduced  to  facilitate 
more  powerful  refinement  actions  (such  as  combining  two 
sections  of  code  into  a  single  construct). 

From  the  previous  discussion*  it  is  clear  that  further 
research  is  needed  to  have  a  better  environment  for  program 
modification.  For  instance*  in  order  to  use  this  approach  to 
different  programming  languages*  the  construction  of  the 
program  representation  should  be  at  least  sem i -automated .  This 
should  be  feasible  because  the  program  representation  and  the 
operations  on  it  are  formally  defined.  Furthermore*  since 
separate  compilation  is  a  very  important  and  useful  feature  of 
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programming  languages/  a  practical  syntax-directed  programming 
system  should  have  facilities  to  support  this  feature.  In 
addition/  the  program  modification  system  should  easily 
incorporate  many  other  software  tools/  such  as  ripple  effect 
analyzer/  which  will  be  discussed  in  the  next  section. 
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5.0  RIPPLE  EFFECT  ANALYSIS 


One  of  the  most  serious  problems  facing  the  maintenance 
programmer  is  to  accurately  determine  the  consequences  of 
making  a  particular  program  modification.  While  visual 
inspection  can  be  successful*  automated  analysis  techniques  are 
likely  to  be  more  reliable.  We  have  developed  an  approach  to 
perform  automated  analysis  of  the  ripple  effects  of  program 
modification*  and  this  approach  has  been  demonstrated  using 
PASCAL  programs  on  a  DEC  UAX-11/7B0  computer.  The  analysis 
technique  may  be  used  to  identify  potential  ripple  effects  on 
both  the  logical  and  performance  aspects  of  program  behavior. 
The  logical  ripple  effect  analysis  technique  is  a  significant 
improvement  over  that  previously  demonstrated  for  JOVIAL 
programs  CYAU78*  80a*  80b]  and  is  able  to  deal  uiith  the 
problems  of  recursion  and  dynamic  aliasing.  In  this  section* 
uie  uiill  present  both  our  logical  and  performance  ripple  effect 
analysis  techniques. 


5. 1  Logical  Ripo le  Effect  Analus is 


The  logical  ripple  effect  analysis  technique  presented 
here  is  to  statiscally  analyze  the  changes  to  the  data  flout  of 
the  program  introduced  by  an  initial  program  modification. 
When  the  value  or  attribute  of  a  variable  in  one  portion  of  the 
program  may  be  changed  after  an  initial  program  modification* 
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the  variable  may  cause  potential  errors  when  it  is  used.  Thus* 
this  variable  is  identified  as  a  potent i al  error  source .  As  a 
simple  example*  consider  the  following  program  segment: 

51  :  x  : =  x  +  l; 

52  :  y  : =  x  +  z; 

Suppose  that  the  expression  on  the  right-hand  side  of  the  first 
assignment  statement  SI  is  modified  in  an  initial  program 
modification/  then  the  assignment  of  y  in  S2  may  become 
logically  inconsistent  with  the  initial  modification. 

Similarly/  when  a  control  condition/  e.g.  if  (x  >  y)/  is 
changed  in  an  initial  program  modification/  potential  errors 
may  be  introduced  to  the  program  since  the  execution  and  hence 
the  result  of  the  program  may  be  changed. 

A  potential  error  source  can  be  a  primary  or  a  secondary 
error  source.  A  pr i maru  error  source  is  a  variable  or  control 
condition  whose  value  or  attribute  is  modified  by  an  initial 
program  modification.  A  second aru  error  source  is  then  a 
variable  or  control  condition  whose  value  or  attribute  may 
become  inconsistent  with  the  initial  program  modification.  In 
the  above  example/  x  in  SI  is  called  a  primary  error  source/ 
while  y  in  S2  a  secondary  error  source.  The  propagation  of  the 
potential  error  sources  will  be  referred  to  as  potential  error 
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To  identify  the  potential  error  floui*  our  logical  ripple 
effect  analysis  technique  identifies  and  utilizes  the 
definition  and  usage  information  commonly  used  in  data  flow 
analysis  techniques  CALLE743,  CLOME773,  C  BART7B3 .  CR0SE79], 
CARTHBlJ.  In  the  above  example.  our  logical  ripple  effect 
analysis  technique  will  identify  y  as  a  secondary  error  source 
based  on  the  information  that  the  definition  of  y  in  S2  uses  x» 
which  is  a  primary  error  source.  Hence,  the  scope  of  logical 
ripple  effect  which  can  be  identified  using  our  technique  is 
bounded  by  the  capabilities  of  its  underlying  data  flow 
analysis  technique. 

Our  logical  ripple  effect  analysis  technique  is  similar  to 
those  program  analysis  tools,  such  as  DAUE  CF0SD761  and  program 
slicing  technique  CWEIS81J.  in  that  they  are  all  based  on  data 
flow  analysis  of  the  program.  However,  they  differ  in  their 
applications  of  the  data  flow  information.  For  example.  DAUE 
is  concerned  with  identifying  the  data  flow  anomalies  of  a 
program.  while  program  slicing  technique  is  focused  on 
identifying  an  executable  "slice"  of  a  program  which  may  result 
in  the  definition  of  a  variable  at  one  point  of  the  program. 
Both  DAUE  and  program  slicing  technique  are  not  applicable  in 
identifying  the  logical  ripple  effect  of  an  initial  program 
modification.  because  they  do  not  identify  the  changes  to  the 
data  flow  of  the  program  after  an  initial  program  modification. 
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Our  technique,  on  the  other  hand,  provides  a  trace  of  the 
program  segments  which  may  be  affected  by  the  logical  ripple 
effect  of  an  initial  program  modification. 

In  this  section,  only  the  framework  of  our  logical  ripple 
effect  analysis  technique  is  presented  through  the  development 
of  abstract  models.  These  models  can  be  applied  on  sequential 
programs  written  in  high  level  languages  such  as  FORTRAN, 
PASCAL,  etc.  Implementation  or  language  specific  details  are 
not  discussed  here. 

Our  technique  performs  logical  ripple  effect  analysis  in 
two  stages.  The  first  stage  is  the  error  flow  model 
construction  stage,  during  which  an  intramodule  error  flow 
model  and  then  an  intermodule  error  flow  model  will  be 
constructed  to  characterize  how  potential  error  sources  can 
propagate  within  the  modified  version  of  the  program.  The 
intramodule  error  flow  model  characterizes  how  potential  error 
sources  can  propagate  within  the  modules  in  the  program.  The 
intermodule  error  flow  model  character izes  how  potential  error 
sources  can  propagate  between  the  modules  in  the  program.  The 
construction  of  the  intramodule  error  flow  and  the  intermodule 
error  flow  models  have  approximately  the  same  level  of 
complexity  as  the  intramodule  and  intermodule  data  flow 


analyses,  respectively. 
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The  second  stage  of  our  logical  ripple  effect  analysis  is 
the  logical  ripple  identification  stage  which  concerns  with 
identifying  the  potential  error  sources  implicated  by  an 
initial  program  modification.  This  stage  can  be  performed  in 
two  phases.  During  the  first  phase*  the  primary  error  sources 
are  identified  based  on  the  initial  program  modification  and 
the  error  flow  models  of  the  modified  program.  Then,  in  the 
second  phase,  the  logical  ripple  effect  will  be  traced 
utilizing  the  primary  error  sources  and  the  error  flow  models. 

The  logical  ripple  effect  analysis  technique  presented 
here  is  capable  of  providing  exhaustive  tracing  of  the  logical 
ripple  effect.  It  can  be  tailored  to  support  other  strategies 
for  logical  ripple  effect  tracing.  For  instance,  an 
implementation  of  the  technique  may  provide  only  intramodule 
error  flow  tracing  which  can  be  sufficiently  effective  in  an 
environment  where  intramodule  error  flow  dominates,  while  the 
cost  of  applying  this  technique  can  be  greatly  reduced. 
Another  example  of  an  implementation  of  this  technique  is  to 
identify  only  the  error  sources  directly  implicated  by  the 
primary  error  sources. 

A  prototype  logical  ripple  effect  analyzer  for  PASCAL 
programs  has  been  developed.  This  analyzer  provides  an 
interactive  environment  for  tracing  the  logical  ripple  effect. 
The  extent  of  the  logical  ripple  effect  tracing  can  be 
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controlled  by  the  software  maintenance  programmer  such  that  he 
can  choose  the  program  areas  of  his  interests  to  be  examined  by 
the  logical  ripple  effect  tracing  scheme.  Also#  the  software 
maintenance  programmer  can  eliminate  some  modules  or  variables 
from  the  logical  ripple  effect  tracing,  which  are  not  affected 
by  the  initial  modification  based  on  his  understanding  of  the 
program.  Thus,  our  logical  ripple  effect  analysis  technique 
can  identify  the  program  areas  which  will  require  additional 
maintenance  effort.  Some  experimental  results  of  our  logical 
ripple  effect  analysis  technique  for  PASCAL  programs  will  also 
be  presented. 

5.1.1  Intr amodu  1  e  Error  Flow  node  1 

In  this  section,  we  will  present  the  intramodule  error 
flow  model,  and  show  how  the  propagation  of  potential  error 
sources  within  the  modules  in  a  program  can  be  modelled  by  the 
model.  Before  we  present  these,  we  need  to  make  a  number  of 
def  i  n  i  t i ons . 

A  program  modu 1 e  is  defined  to  be  a  separately  invokable 
piece  of  code  having  a  single  entry  and  a  single  exit. 
Practically  speaking,  a  module  can  correspond  to  a  SUBROUTINE 
or  PROCEDURE,  etc.  To  reduce  complexity,  a  program  module  is 
further  represented  as  a  set  of  program  blocks.  A  program 
block  can  be  either  a  local  block  or  an  external  block.  It 
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will  be  seen  later  that  there  is  a  sequence  of  three  blocks  in 
the  invoking  module  for  each  module  invocation*  which  can  be  a 
procedure  call  statement  or  a  function  reference;  and  these 
three  blocks  for  each  module  invocation  are  called  external 
program  block.  A  local  program  block  contains  an  expression 
which  provides  a  control  condition*  or  a  simple  statement  other 
than  a  procedure  call  statement.  Each  program  block  has  a 
single  entry  and  a  single  exit.  However*  a  program  block  may 
reach  or  be  reached  by  several  program  blocks.  For  example*  a 
program  block  containing  an  "if”  clause  may  reach  two  program 
blocks  correspond  ing  to  the  “then”  and  "else"  parts  of  the  "if" 
st  atement . 

The  flow  of  control  among  the  program  blocks  of  a  module 
can  be  represented  by  a  contro  1  f  low  or aph  associated  with  this 
module.  The  control  flow  graph  associated  with  a  module  m  can 
be  expressed  as  a  quadruple*  CFGCm3  =  (U»  B*  u*  v),  where  U  is 
the  set  of  vertices  representing  the  set  of  program  blocks  in 
the  module  m,  B  is  the  set  of  branches  which  are  ordered  pairs 
of  vertices  representing  the  flow  of  control  from  the  exit 
point  of  a  program  block  to  the  entry  point  of  another  program 
block*  u  is  an  element  of  U  representing  the  entry  block  of  the 
module  m,  and  v  is  an  element  of  U  representing  the  exit  block 
of  the  module  m.  Note  that  the  entry  and  exit  blocks  of  m  are 
used  to  trace  the  error  flow  into  and  out  from  m.  They  do  not 
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correspond  to  any  executable  statements  in  m. 

The  intramodule  error  flow  analysis  can  be  simplified  by 
decomposing  the  error  flow  uiithin  a  module  into  the  error  flow 
which  occurs  within  a  program  block  and  the  error  flow  which 
occurs  between  program  blocks  in  a  module.  In  order  to  analyze 
the  error  flow  within  program  blocks  and  between  program 
blocks*  it  is  necessary  to  develop  a  characterization  for  a 
program  block  which  reflects  how  potential  error  sources  may 
flow  within  the  program  block. 

5 . 1 . 1 . 1  Block  Error  Character i st i cs 

The  basis  for  the  char acter izat ion  of  a  program  block 
requires  the  identification  of  all  data  items  and  control  items 
in  the  program  block.  A  data  item  is  a  member  of  the  set  of 
minimal  information  units  which  describe  the  program.  They 
basically  consist  of  the  program's  variables.  The  control 
items  are  artificially  created  in  our  logical  ripple  effect 
analysis  to  provide  a  basis  for  linking  the  data  flow  and 
control  flow  information  together  in  the  program.  A  contro 1 
item  is  created  for  each  control  condition  which  determines  the 
execution  of  a  statement  or  a  group  of  statements.  For 
example#  the  predicate  in  a  conditional  "if"  statement  provides 
a  control  condition  which  determines  the  outcome  of  this 
decision  point#  and  hence  a  control  item  is  created  to 
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represent  the  predicate.  A  FORTRAN  "do"  statement  which 
establishes  a  controlled  loop  also  provides  one  type  of  control 
item.  A  control  item  can  be  created  in  such  a  manner  that  it 
will  not  generate  any  erroneous  error  flout  in  the  program  by 
assigning  to  it  a  symbolic  name  mhich  is  guaranteed  to  be 
distinct  from  any  identifier  in  the  program  and  from  any  other 
control  item.  A  definition  is  an  item  whose  value  is  modified 
or  read  in  a  part  of  a  statement*  or  whose  associated  control 
condition  is  defined  in  an  expression.  A  usage  is  an  item 
whose  value  is  referenced  in  a  part  of  a  statement*  or  whose 
associated  control  condition  can  affect  the  execution  of  the 
st  atement . 

It  is  assumed  in  our  error  flow  analysis  that  all  data 
items  have  a  unique  memory  address  and  that  this  memory  address 
can  be  symbolically  determined  prior  to  program  execution. 
This  implies  that  the  variables  with  the  same  name  but 
different  scopes  are  treated  as  different  data  items.  It  also 
implies  that  all  the  elements  in  a  data  structure  are 
represented  by  the  data  structure  itself.  Due  to  the  static 
nature  of  our  analysis*  it  is  infeasible  to  trace  the  exact 
error  flow  for  programs  which  contain  data  structures. 
However*  the  worst-case  error  flow  can  be  computed  by  treating 
a  data  structure  as  a  single  data  item.  Thus*  if  an  element  in 
a  data  structure  is  affected  by  the  error  flow*  the  whole  data 
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structure  is  considered  to  be  affected  by  the  error  flow. 

A  data  item  is  said  to  employ  exp  1  i  c  i  t  address i no  if  it  is 
a  simple  data  item;  otherwise,  it  is  said  to  employ  implicit 
addressing.  An  example  of  implicit  addressing  of  data  items  is 
the  array  data  structure.  A  control  item  is  treated  as 
employing  explicit  addressing  although  there  is  no  memory 
address  corresponding  to  it. 

The  characterization  of  the  potential  error  behavior  of  a 
block  can  be  formally  defined  as  follows: 

Definition  5.1.  The  block  error  characteristics  of  a  block  b 
consists  of  two  sets  CCb]  and  PCb],  and  a  mapping  FMC  b 3 .  The 
source  capable  set  CCb]  of  b  is  the  set  of  items  which  can 
become  error  sources  due  to  an  execution  of  b.  A  subset  of 
CCb]  consisting  of  the  elements  in  CCb]  which  employ  explicit 
addressing  is  called  the  explicit  source  capab le  subset  of  b 
and  denoted  by  ECCb].  The  potential  propagator  set  PCb]  of  b 
is  the  set  of  items  which  can  implicate  some  secondary  error 
sources  due  to  an  execution  of  b.  The  f 1 ow  mapping  FMC b ]  of  b 
is  a  function  from  the  set  PCb]  to  the  power  set  of  CCb],  For 
each  element  p  of  the  set  PCb],  the  subset  of  CCb]  which  is  the 
image  of  p  under  the  flow  mapping  FMC b ]  is  defined  as 

FMC  b  ]  ( p )  =  -C  c  e  CCb]  )  p  can  implicate  c  as  a  secondary 
error  source  due  to  an  execution  of  b  >. 


The  error  characteristics  of  a  local  block  characterize 
the  potential  error  behavior  of  the  statement  or  expression 
contained  in  the  block.  On  the  other  hand/  the  error 
char acter i st  ics  of  a  sequence  of  three  external  blocks  for  a 
module  invocation  characterize  the  potential  error  behavior  of 
the  module  invocation. 

It  is  clear  that  the  potential  propagator  set  and  the 
source  capable  set  are  needed  for  modelling  the  potential  error 
behavior  of  a  block.  The  float  mapping#  uthich  provides  the 
relationships  between  the  tuio  sets  of  items#  is  also  needed 
because  the  sets  of  secondary  error  sources  implicated  by  the 
elements  of  the  potential  propagator  set  can  be  different  when 
the  block  is  an  external  block  used  to  model  the  potential 
error  behavior  of  a  module  invocation#  or  when  multiple 
assignment  in  a  simple  statement  is  possible  for  the  source 
language  of  the  program  to  be  analyzed.  Furthermore#  the  block 
error  characteristics  defined  above  are  sufficient  to 
characterize  the  potential  error  behavior  of  a  block  because 
the  source  capable  set  provides  the  set  of  items  which  can 
become  an  error  source#  and  the  potential  propagator  set  and 
the  flowing  mapping  together  provide  the  set  of  items  which  can 
implicate  some  secondary  error  sources  as  well  as  their 
implicated  secondary  error  sources  due  to  an  execution  of  the 
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5 . 1 .  1 . 2  Construct  ion  Of  Intr  amodu  1  e  Error  F  loui  Mode  1 

The  construction  of  the  intramodule  error  flow  model  for  a 
program  is  similar  to  the  identification  of  the  local  data  floui 
information  in  data  flow  analysis  techniques  tALLE74]» 
t  L0ME77 1 ,  C  BART781 ,  CROSE793,  C  ARTH01 ] .  For  a  local  block  b> 
the  source  capable  set  CCbJ  basically  corresponds  to  the  MODIFY 
set  which  is  commonly  used  in  data  flow  analysis  techniques. 
This  is  because  each  definition  x  in  b  can  become  an  error 
source  either  due  to  an  initial  program  modification  to  the 
definition  of  x  in  b,  or  x  is  defined  in  b  with  some  usages 
which  are  error  sources.  The  potential  propagator  set  Ptbl 
basically  corresponds  to  the  USE  set  in  data  flow  analysis 
techniques  because  each  usage  y  in  b  is  used  to  define  some 
definition  x  in  b.  Hence,  y  can  implicate  x  as  a  secondary 
error  source  if  y  is  an  error  source  flowing  into  b.  Note  that 
the  control  definitions  and  usages  are  included  in  the  block 
error  characterization.  Furthermore,  the  block  error 
characteristics  sets  of  the  entry  and  exit  blocks  are  specified 
as  empty  sets  because  they  do  not  correspond  to  any  executable 
statements. 

In  the  intramodule  error  flow  model  construction  process, 
only  control  usages  are  identified  in  the  block  error 
characteristics  of  the  external  blocks  in  the  program.  These 
block  error  characteristics  will  be  updated  when  the 
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intermodule  error  floui  model  is  constructed. 

The  intramodule  error  -flow  model  can  be  constructed  by  an 
extended  parser  IAH0723  of  the  source  language  of  the  program 
to  be  analyzed.  The  intramodule  error  flow  model  construction 
process  can  be  described  quite  formally  using  an  attributed 
grammar  CAH0723  of  the  source  language. 

Examp le  5.1.  Consider  the  PASCAL  program  given  in  Figure  5.1. 
This  program  computes  the  roots  of  a  quadratic  equation  a*x*x  + 
b*x  +  c  =  0.  The  block  error  characteristics  of  the  blocks  l 
to  6  which  are  constructed  in  the  procedure  rroots  are  shown  in 
Figure  5.2.  The  control  flow  in  the  procedure  rroots  is 
sequent i a  1 . 

5 . 1 . 1 . 3  Intr amodu 1 e  Error  F low  Tracing 

Now,  let  us  consider  the  tracing  of  the  error  flow  within 
a  module.  The  error  flow  can  be  described  in  terms  of  the 
propagation  error  source  sets  of  the  blocks  as  defined  below. 

Pef  in  it  ion  5.2.  The  propagation  error  source  set  ESCbl  of  a 
program  block  b  consists  of  the  set  of  error  sources  which 
reach  the  exit  point  of  the  block  b. 
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Block  Source  code 


1 

2 

3 

4 

5 

6 


program  ex  amp  1 e ( i nput ,  output); 
var  a,  b>  c,  xrl<  xr2,  xi:  real; 

procedure  roots(aa<  bb,  cc:  real); 
war  xl#  xr>  xs>  disc:  real; 

procedure  rrootstrrootsdisc/  rrootsxl); 

war  rrootsx2  :  real; 

begin 

rrootsx2  :=  sqrt(rrootsdisc); 
xrl  :=  rrootsxl  +  rrootsx2; 
xr2  :=  rrootsxl  -  rrootsx2; 
x  i  :  =  0 
end; 


7 

8 
9 


procedure  i  roots (  i  rootsd  i  sc ,  irootsxl); 

war  irootsx2  :  real; 

begin 

irootsx2  :=  sqr t (- i rootsd i sc ) ; 
xrl  :=  irootsxl; 


10 

xr2  :=  irootsxl; 

11 

xi  :=  irootsx2 

12 

end 

9 

13 

beg  in 

14 

xl 

:=  -  bb  ✓  (2.0  *  aa); 

15 

xr 

: =  xl  *  xl; 

16 

xs 

:  =  c  c  /  a  a ; 

17 

disc  : =  xr  -  xs; 

18 

if 

disc  >=  0 

19,20,21 

then  rroots(disc,  xl) 

22, 23, 24 

else  irootsCdisc,  xl) 

25 

end ; 

26 

begin 

27 

re  ad ( a 

,  b ,  c  ) ; 

28 

if  a  <  >  0  then 

begin 

29,30,31 

roots  (a,  b,  c);- 

32 

tur  i  te  1  n  <  xr  1 ,  xr2,  xi) 

end 

33 

else  uir  i  te  1  n  (  '  Not  a  qyadrat; 

34 

end  . 

Figure  5.1.  An  example  program. 
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CC13  =  0; 

PC i 3  =  0; 

CC 2  3  =  -C  rrootsx2  >  * 

PC 23  ;  -C  rrootsdisc  >; 

FMC 23 < rrootsd i sc )  =  C  rrootsx2  >; 

CC 33  =  C  xrl  >; 

PC 33  =  t  rrootsxlj  rrootsx2  >; 

FMC 33 (rrootsx t )  s  FMC  33 (rrootsx2)  =  C  xrl  >; 

CC 43  =  <  xr2  >; 

PC 4 3  =  <  rrootsxl.  rrootsx2  >; 

FMC  43  (rrootsxl)  =  FMC  33  ( r  r  ootsx2 )  =  -C  xr2  >J 

CC 53  =  <  xi  >J 
PC  53  =  0; 

CC  63  =  0; 

PC  63  =  0; 


Figure  5.2.  The  error  characteristics  of  the  blocks  in  rroots 
in  the  program  shouin  in  Figure  5.1. 


Given  the  error  source  set  ESCa3  of  a  block  a,  the  sets  of 
error  sources  uihich  reach  the  exit  points  of  the  immediate 
successor  blocks  of  a  can  be  determined  based  on  the  set  ESCa3 
and  the  block  error  characteristics  of  these  immediate 
successor  blocks.  A  tracing  f unct  ion  f(a»  b)  is  defined  below 
to  derive  the  set  of  error  sources  which  reach  the  exit  point 
of  a  block  b»  given  the  propagation  error  source  set  ESCa3  of 
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an  immediate  predecessor  ’block  a  of  the  block  b.  An  error 
source  x  will  flow  out  of  block  b  as  a  result  of  the  incoming 
error  source  set  ESCa3  if  one  of  the  following  two  conditions 
holds: 

(1)  x  is  implicated  in  b  as  a  secondary  error  source  by  an 
element  of  ESCaD,  or 

(2)  x  is  an  incoming  error  source  which  passes  through  b. 

Thus,  the  tracing  function  f(a,  b)  is  defined  as  the  union 
of  the  two  sets  of  error  sources,  each  of  which  contains  all  of 
the  error  sources  satisfying  one  of  the  aboue  two  conditions. 
Under  Condition  (1),  each  element  x  in  the  intersection  of  PCb3 
and  ESC  a  3  is  capable  of  implicating  a  set  of  secondary  error 
sources  in  b  because  x  is  an  incoming  error  source  and  it  can 
propagate  potential  errors  to  some  items  in  b.  The  set  of 
secondary  error  sources  implicated  by  x  in  b  can  be  obtained  by 
the  flow  mapping  on  x,  i.e.  FMtb3(x).  Hence,  FMCb3(PCb3 
ESC  a3 )  is  the  set  of  all  secondary  error  sources  implicated  in 
b  by  the  incoming  error  source  set  ESCa3.  Under  Condition  (2), 
an  incoming  error  source  x  cannot  pass  through  b  if  it  employs 
explicit  addressing  and  it  is  redefined  in  b.  In  other  words, 
x  cannot  pass  through  b  if  it  is  an  element  of  the  explicit 
source  capable  subset  ECCb3  of  the  block  b.  Hence,  (ESCa3 
ECCb3)  is  the  set  of  incoming  error  sources  which  passes 
through  b.  Therefore,  we  have 


f  (  a,  b ) 


(ESC  a] 


ECC  b  3  )  U  EMC  b  3 ( PC  b  3  n  ESC a3  )  . 


The  intramodule  error  flow  from  the  points  of  initial 
program  modification  to  other  areas  in  the  module  can  then  be 
traced  utilizing  this  tracing  function  along  with  the  error 
char acter i st  i  cs  of  the  module's  blocks  and  the  control  flow 
graph  of  this  module.  The  tracing  function  can  be  applied  on  a 
o  1  ock-  i  aimed  i  ate  successor  block  basis  to  form  an  algorithmic 
technique  to  trace  the  intramodule  error  flout.  Applying  the 
tracing  function  on  a  block-immediate  successor  block  basis 
means  that  errors  are  propagated  from  an  initial  error  source 
block  s  to  all  immediate  successor  blocks  t  of  s#  and  then  from 
t  to  all  immediate  successor  blocks  of  t,  etc.  Application  of 
the  tracing  function  repeatedly  in  this  mannrr  identifies  the 
propagation  error  source  set  ESCi3  of  a  block  i  in  a  stepuiise 
manner  with  all  the  error  sources  flowing  from  an  immediate 
predecessor  block  of  i  to  i  contributing  to  the  final  ESCi3. 
The  tracing  function  is  applied  in  this  manner  while  new 
secondary  error  sources  are  identified. 

This  intramodule  error  flow  tracing  scheme  can  be 
formalized  as  an  algorithm.  It  is  assumed  in  this  algorithm 
that  the  propagation  error  source  sets  in  the  module  m  are 
initialized  according  to  some  initial  error  flow  condition. 
Also  assumed  is  an  initial  error  source  bloc.  . et  IBCm3  of  m 
consisting  of  the  blocks  in  m  which  haue  non-empty  initial 
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propagation  error  source  sets.  This  algorithm  is  given  below. 

Algor  ithm  5.1.  Intramodule  Error  Floui  Tracing 

Step  1.  If  IBCm]  is  empty,  then  terminate.  Otherwise,  select 
an  element  from  IBCm]  and  then  delete  it  from  IBCm].  Let  b 


denote 

the 

selected 

block. 

Step  2. 

For  each  immediate 

successor 

block  b '  of 

b ,  first. 

check 

if 

f  ( b ,  b' 

)  is  a  subset 

of  ESCb'].  If  it 

is  not,  then 

let  ESCb'] 

=  ESCb'] 

U  f  <b , 

b  '  ) , 

and 

insert  b' 

into  IBCm] . 

After 

al  l 

the  immediate 

successor 

blocks  of 

b  have  been 

examined,  go  to  Step  1. 

The  proof  that  algorithm  5.1  correctly  identifies  the 
intramodule  error  flout  in  m  implicated  by  the  initial 
propagation  error  source  sets  of  the  blocks  in  m  can  be  found 
in  C HSIE82] .  Now,  we  would  like  to  give  an  example  to 
illustrate  this  algorithm. 

Example  5.2.  Consider  the  procedure  rroots  in  the  program 
given  in  Figure  5.1.  Assume  the  initial  error  flow  in  the 
procedure  rroots  is  given  as  IBCrroots]  ;  -C  1  >,  where  ESC1I  = 
{  rrootsdisc  >  and  ESCb]  =  0  for  the  remaining  blocks  b  in 
rroots,  i.e.  the  input  parameter  rrootsdisc  is  the  only  error 
source  in  the  procedure  rroots  flowing  out  of  the  entry  block  1 
of  rroots.  The  intramodule  error  flow  tracing  for  the 
procedure  rroots  is  then  illustrated  in  Figure  5.3. 
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Input.  ESC  1 3  =  <  rrootsdisc  >. 

ESC i 3  :  0/  for  i  =  2  to  6. 

Step  1.  Since  IBCrroots3  =  C  1  >,  select  block  1  from 
IBCrroots3*  and  let  IBCrroots3  =  0. 

Step  2.  Block  2  is  the  only  immediate  successor  of  block  l: 

Since  f(l*  2)  r  c  rrootsdisc*  rrootsx2  >  is  not  a  subset 
of  ESC 23  =  0*  let  ESC 23  =  <  rrootsdisc*  rrootsx2  J*  and 
IBCrroots3  =  C  2  >. 


Step  l.  Select  block  2  from  IBCrroots3*  and  let 

Step  2.  Block  3  is  the  only  immediate  successor 

Since  f  ( 2*  3)  =  -C  rrootsdisc*  rrootsx2*  xrl  > 
subset  of  ESC 33  =  0*  let  ESC 33  = 

■C  rrootsdisc*  rrootsx2*  xrl  >*  and  IBCrroots3 


IBCrroots3  = 
of  block  2: 
is  not  a 
=  f  3  >. 


0 


Step  1.  Select  block  5  from  IBCrroots3*  and  let  IBCrroots3  =  0 

Step  2.  Block  6  is  the  only  immediate  successor  of  block  5: 

Since  f<5.  6)  =  <  rrootsdisc*  rrootsx2*  xrl,  xr2  >  is  not  a 
subset  of  ESC 63  =  0, 

let  ESC63  s  <  rrootsdisc*  rrootsx2*  xrl,  xr2  >* 
and  IBCrroots3  =  0. 

Step  1.  Select  block  6  from  IBCrroots3*  and  let  IBCrroots3  =  0 

Step  2.  Since  block  6  does  not  have  any  immediate  successors* 
go  to  Step  1. 

Step  1.  Since  IBCrroots3  =  0*  terminate. 

The  final  propagation  error  source  sets  of  the  blocks  in  tl 
procedure  rroots  are  given  as  fol  louis: 

ESC  1 3  =  <  rrootsdisc  >. 

ESC 23  =  <  rrootsdisc*  rrootsx2  >. 

ESC 33  =  C  rrootsdisc*  rrootsx2*  xrl  >. 

ESC 43  =  <  rrootsdisc*  rrootsxB*  xrl*  xr2  >. 

ESC 53  =  <  rrootsdisc*  rrootsx2*  xrl*  xr2  >. 

ESC 63  =  <  rrootsdisc*  rrootsx2*  xrl*  xr2  >. 


Figure  5.3.  Intramodule  error  flow  tracing  in  rroots  in 
the  program  shown  in  Figure  5.1. 
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The  intramodule  error  flow  model  and  the  intramodule  error 
flow  tracing  scheme  together  model  and  trace  the  potential 
error  floui  uiithin  a  module. 

5.1.2  Intermodule  Error  F 1  oui  Model 

In  this  section/  the  intermodule  error  flow  model  is 
presented.  The  intermodule  error  flout  model  models  houi 
potential  error  sources  can  propagate  between  the  modules  in 
the  program.  Error  sources  can  propagate  between  the  invoking 
and  invoked  modules  through  parameter  passing  or  data  sharing 
via  module  invocation. 

A  program  can  be  considered  as  a  collection  of  program 
modules.  There  exists  one  and  only  one  module  in  the  program 
which  starts  program  execution  upon  invocation  by  the  operating 
system.  This  module  is  called  the  main  module .  Upon 
invocation#  a  module  is  executed  and  then  the  module  returns 
control  to  the  invoking  module  at  the  invocation  site  upon  exit 
from  the  module.  The  invocation  relationships  among  the 
modules  in  the  program  can  be  represented  by  a  gall  graph  of 
the  program  C ALLE743 . 

Recently#  much  effort  has  been  devoted  to  the  development 
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of  intermodule  data  flow  analysis  techniques  with  applications 
primarily  to  compiler  optimization  and  static  program  analysis 
CALLE743,  CLOME773*  C  BART783  »  C  R0SE79D  *  CARTH813.  Intermodule 
data  flow  information  that  is  used  at  the  point  of  module 
invocation  has  been  called  summaru  data  f  low  informat  ion 
CALLE743.  With  each  module  invocation*  a  summary  of  the 
variables  which  may  be  modified*  used*  or  preserved  due  to  this 
module  invocation  will  be  generated  for  data  flow  analysis. 

In  our  logical  ripple  effect  analysis*  the  intermodule 
error  flow  is  modelled  utilizing  an  approach  similar  to  usual 
intermodule  data  flow  analysis  techniques.  The  summary  error 
flow  information  of  a  module  is  called  the  module  error 
char acter ist ics  of  the  module*  and  will  be  generated  to 
represent  the  potential  error  flow  properties  of  the  module. 

In  order  to  model  the  intermodule  error  flow  which  occurs 
at  a  module  invocation*  a  sequence  of  three  blocks  is 
constructed  in  the  invoking  module  for  this  invocation.  The 
first  block  in  the  sequence*  called  an  input  par ameter  mapp ing 
b  lock*  is  used  to  establish  the  error  flow  from  the  actual 
input  parameters  to  their  corresponding  formal  input  parameters 
of  the  invoked  module.  The  second  block  in  the  sequence* 
called  an  i nvoc  at  ion  block*  is  used  to  reflect  the  potential 
error  flow  properties  of  the  invoked  module  represented  by  the 


module  error  characteristics  of  the  invoked  module.  The  third 
block  in  the  sequence/  called  an  output  o ar ameter  maoo ino 
block/  is  used  to  establish  the  error  float  from  the  formal 
output  parameters  of  the  invoked  module  to  their  corresponding 
actual  output  parameters.  The  error  character ist i cs  of  the 
three  blocks  can  be  updated/  after  the  error  characteristics  of 
the  invoked  module  have  been  generated/  based  on  the  invoked 
module's  error  characteristics  and  the  parameter  passing 
information  associated  with  this  invocation. 

5. 1 . 2. 1  Modu 1 e  Error  Character ist ics 

To  define  the  module  error  characteristics  of  a  module  m, 
it  is  first  necessary  to  identify  the  data  interface  of  m 
consisting  of  the  items  uihich  can  interact  with  the  global 
environment  of  m.  The  data  interface  of  m  is  represented  by 
the  parameter  set  of  m  which  is  formally  described  by  the 
following  definition: 

Definition  5.3.  The  par  ameter  set  PSCmD  of  a  module  m  consists 
of  the  formal  parameters  of  m/  the  item  representing  the  return 
value  of  the  module  if  m  is  a  function/  and  the  data  items 
which  are  global  to  m  and  are  referenced  in  m  or  any  of  m's 


invoked  modules. 
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The  module  error  characteristics  of  a  module  m  can  be 
formally  defined  as  follouis: 

Definition  5.4.  The  module  error  characteristics  of  a  module  m 
consists  of  two  sets  MCCml  and  MPCm3/  and  a  mapping  MFMt  m3 . 
The  module  source  capable  set  MCLm3  of  m  consists  of  the  items 
in  the  parameter  set  PSCm3  of  m  each  of  which  can  become  an 
error  source  due  to  an  invocation  of  m.  The  modu le  potential 
propagator  set  MP[m3  of  m  consists  of  the  elements  in  PStm3 
each  of  which  can  implicate  some  elements  in  PSCm3  as  secondary 
error  sources  capable  of  affecting  the  global  environment  of  m 
due  to  an  invocation  of  m.  The  module  flow  mapp inq  MFMC m3  of  a 
module  m  is  a  function  from  the  set  MP£m3  to  the  power  set  of 
MCtm3.  For  each  element  p  of  the  set  MPCm3#  the  subset  of 
MCC  m3  which  is  the  image  of  p  under  MFMC  m3  is  defined  as 

MFMC  m3 ( p )  =  <  c  €  MCCm3  |  p  can  implicate  c  as  a  secondary 
error  source  due  to  an  invocation  of  m  >. 

The  module  error  char acter ist ics  of  a  module  m  provide  the 
set  of  items  which  can  become  error  sources  capable  of 
affecting  the  global  environment  of  m*  and  the  set  of  items 
which  can  propagate  potential  error  sources  from  the  global 
environment  of  m  to  implicate  some  secondary  error  sources 
capable  of  affecting  the  global  environment  of  m  as  well  as 
their  implicated  secondary  error  sources  due  to  an  invocation 
of  m.  It  is  clear  that  the  module  error  character ist ics  of  a 


module  are  necessary  for  modelling  the  potential  error  flow 
behavior  of  the  module.  To  shorn  that  they  are  sufficient/  it 
is  not  necessary  to  include  the  items  which  are  not  elements  of 
the  parameter  set  PSLmD  of  a  module  m  in  the  module  error 
characteristics  of  m  because  they  cannot  interact  with  the 
global  environment  of  m.  Furthermore/  it  is  not  necessary  to 
include  an  item  x,  capable  of  implicating  some  elements  in 
PSLmD  as  secondary  error  sources  none  of  which  can  affect  the 
global  environment  of  mz  in  the  module  propagator  set  of  m 
because  the  error  sources  implicated  by  x  cannot  affect  the 
global  environment  of  m.  Therefore/  the  module  error 
characteristics  are  sufficient  to  model  the  potential  error 
flow  properties  of  a  module. 

The  order  in  which  the  error  characteristics  of  the 
modules  in  the  program  are  generated  is  very  important  because 
the  error  char acter i st i cs  of  an  invoked  module  can  affect  those 
of  its  invoking  modules.  In  nonrecursive  programs/  there  is 
some  ordering/  called  the  reverse  invocation  order  which  has 
the  property  that  when  modules  are  examined  in  this  order/ 
invoked  modules  are  always  analyzed  in  advance  of  the  modules 
which  invoke  them  C  ALLE743 .  Therefore/  the  error 
character i st i cs  of  the  modules  in  a  nonrecursive  program  can  be 
generated  following  the  reverse  invocation  order.  In  the  case 
of  recursive  programs/  there  is  no  ordering  with  such  a 
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property.  Furthermore/  the  local  variables  of  a  recursive 
module  may  exhibit  different  error  floui  properties  in  different 
activations  of  the  module  because  recursive  activations  of  the 
module  will  create  separate  copies  of  the  module's  local 
variables/  called  incarnat ions  of  the  variables. 

Intermodule  error  flow  is  also  complicated  by  dunamic 
aliasing/  which  is  a  problem  that  occurs  when  syntactically 
distinct  names  are  used  to  represent  the  same  or  overlapping 
storage  areas  at  run  time.  In  the  presence  of  dynamic 
aliasing/  the  error  char acter ist ics  of  the  modules  must  be 
generated  with  the  consideration  of  dynamic  aliasing 
conditions.  Dynamic  aliasing  can  be  caused  by  reference 
parameters . 

Our  approach  to  model  the  intermodule  error  flow  for 
nonrecursive  programs  which  do  not  have  any  dynamic  aliasing 
anomalies  will  be  described  here.  By  a  dunamic  aliasing 
anomalu  we  refer  to  the  problem  where  either  a  variable  is 
passed  by  reference  to  more  than  one  formal  parameter  of  a 
module/  or  a  global  variable  which  is  referenced  in  a  module  is 
also  passed  by  reference  to  a  formal  parameter  of  that  module. 
In  the  presence  of  a  dynamic  aliasing  anomaly/  the  formal 
reference  parameters  of  the  module  cannot  be  treated  as 
independent  entities.  Dynamic  aliasing  anomalies  tend  to 
complicate  testing  of  programs/  and  hence  modern  programming 
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practices  advocatp  the  elimination  of  dynamic  aliasing 
anomalies  CWA£S80I»  L ICHB793 .  An  approach  to  handle  programs 
uihich  have  recursion  or  dynamic  aliasing  anomalies  can  be  found 
in  EHSIE821. 

For  a  nonrecursive  program  uiithout  any  dynamic  aliasing 
anomaly^  the  error  character ist ics  of  the  modules  in  the 
program  can  be  generated  following  the  reverse  invocation 
order.  Each  module  has  to  be  analyzed  only  once.  After  the 
error  characteristics  of  a  module  have  been  generated/  the 
error  characteristics  of  the  external  blocks  for  invocations  of 
the  module  are  then  updated. 


5 . 1.2.2  Gener at  ion  Of  Modu 1 e  Error  Character  ist  ics 

The  error  characteristics  of  a  module  m  can  be  generated 
by  the  following  algorithm  based  on  the  parameter  set  PSCml  of 
m  and  the  error  ch ar acter  i  st  i  cs  of  all  the  blocks  in  m. 


Algorithm  5.2.  Ident  if  icat ion  of  Module  Error  Char acter ist ics 


Step  1.  Initialize  the  set  MPEmI  to  be  empty. 


Step  2.  Calculate  the  set  MCCmD  by  computing  (PSEmD  FT  (  u  CCbl 
I  b  is  a  block  in  ml). 


Obtain 


computing  (PSEmI  A  (  U  PEbl  |  b  is  a 


block  in  m ) )  . 


Step  4. 


If  T  is  empty*  then  terminate.  Otherwise 


select  an 
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step 

_1 . 

MPCrrootsl  =  0. 

step 

_2. 

P1C£rroots3  =  £  rrootsdisc*  rrootsxl*  xrl*  xr2*  xi  > 

£  rrootsx2*  xrl*  xr2*  xi  > 

:  £  xr 1 »  xr2*  x i  > . 

Step 

_3. 

T  =  £  rrootsdisc*  rrootsxl*  xrl*  xr2*  xi  > 

£  rrootsdisc*  rrootsxl.  rrootsx2  > 

=  £  rrootsdisc*  rrootsxl  >. 

Step 

_4  . 

Select  rrootsdisc  from  T,  and  then  let  T  =  £ 

rrootsxl 

Step 

_5 . 

Let  IBCrroots3  =  C  1  >*  ESC13  =  £  rrootsdisc 
ESCi3  =  0.  for  i  from  2  to  6. 

>*  and 

Step 

_6 . 

The  set  ESC63  odtained  by  Algorithm  1  is 
ESC63  =  £  rrootsdisc*  rrootsx2*  xrl.  xr2  >. 

Step 

ESC 63  MCCrroots]  =  £  xrl,  xr2  >. 

Therefore*  let  MPCrroots3  =  £  rrootsdisc  >» 
MFMC rroots ]( rrootsd i sc )  =  £  xrl.  xr2  >. 

and 

Step 

_4. 

Select  rrootsxl  from  T.  and  then  let  T  =  0. 

Step 

_5 . 

Let  IBCrroots3  =  £  1  >*  ESC  1 3  =  €  rrootsxl  > 
ESC i 3  =  0*  for  i  from  2  to  6. 

*  and 

Step 

The  set  ESC 63  obtained  by  Algorithm  l  is 

ESC 63  =  £  rrootsxl*  xrl.  xr2  >. 

Step 

_7. 

ESC 63  MCCrroots3  =  £  xrl,  xr2  >. 

Therefore*  let  MPCrroots3  =  £  rrootsdisc*  rrootsxl  >* 
end  MFMCrroots3 (rrootsxl )  =  £  xrl,  xr  2  >. 


Step  4.  Since  T  is  empty*  terminate. 


The  error  characteristics  of  the  procedure  rroots  identified  oy 
Algorithm  2  are  as  follows: 

MCCrroots3  =  £  xrl,  xr2*  xi  >* 

MPCrroots3  =  £  rrootsdisc*  rrootsxl  >; 

MFMCrroots3 (rrootsdisc  )  =  MFMCrroots3 (rrootsxl ) 

=  £  xrl*  xr2  >. 

Figure  5.4.  The  module  error  characteristics  of  rroots  in 
the  program  shown  in  Figure  5.1. 
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5i  2.3  Update  Block  trror  Characteristics 

Let  i,  j.  and  k  be  a  sequence  of  three  external  blocks  in  ® 

a  module  n  for  an  invocation  of  m.  For  the  first  block  i  in 
the  sequence/  i.e.  the  input  parameter  mapping  block.  each 
formal  input  parameter  x  of  m  should  be  inserted  into  the  • 

source  capable  set  C C i 3  of  block  i.  Each  data  item  y  which  nas 

positional  correspondence  to  x  in  the  actual  parameter  list  of 
this  invocation  should  be  inserted  into  the  potential  • 

propagator  set  Pti3  of  block  i.  while  x  is  inserted  into 
FMC  i  3 (y  )  . 

For  the  second  block  j  in  the  sequence.  i.e.  the 

invocation  block,  each  element  of  the  module  source  capable  set 
MCCm3  of  m  should  be  inserted  into  the  source  capable  set  CCj3 
of  block  j.  Also.  each  element  x  of  the  module  potential 
propagator  set  MPtm3  of  m  should  be  inserted  into  the  potential 
propagator  set  PCj3  of  block  j,  while  each  element  of  MFMCm](x) 
is  inserted  into  FHCj3(x). 

For  the  last  block  k  in  the  sequence.  i.e.  the  output 
parameter  mapping  block.  each  formal  output  parameter  z  of  m 
should  be  inserted  into  the  potential  propagator  set  PCk3  of 


block  k.  For  each  formal  output  parameter  z  of  m.  let  w  be  the 
data  definition  and  X  be  the  set  of  usages  in  the  actual 
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parameter  list  associated  with  this  invocation  which  have 
positional  correspondence  to  z.  Then*  to  should  be  inserted 
into  both  the  source  capable  set  CCkl  of  block  k  and  FMCk3(z>. 
Furthermore,  each  element  x  of  the  set  X  should  be  inserted 
into  the  potential  propagator  set  PCkD  of  block  k,  while  w  is 
inserted  into  F M C  k  D  C  x  )  . 

furthermore,  for  each  control  usage  in  the  potential 
c'  upagator  set  of  each  block  in  the  sequence  identified  by  the 
'-a  module  error  flow  model  construction  process,  the  flow 
nit  !■  i  n  g  on  the  control  usage  is  updated  to  be  the  source 
.» p  a  b  1  e  set  of  the  block. 

For  some  programming  languages,  such  as  JOUIAL,  a  formal 
parameter  of  a  module  can  be  identified  as  an  input  or  output 
formal  parameter  based  on  the  syntax  rules  of  the  languages. 
For  other  programming  languages  which  cannot  distinguish 
syntactically  between  the  formal  input  and  output  parameters,  a 
formal  parameter  x  of  a  module  m  is  an  input  formal  parameter 
if  x  is  an  element  of  the  module  potential  propagator  set  MPCm] 
of  m.  A  formal  parameter  y  of  m  is  an  output  formal  parameter 
if  y  is  an  element  of  the  module  source  capable  set  MCCm]  of  m. 

Ex  amp l e  5.4.  Consider  the  invocation  of  the  procedure  rroots 
in  the  program  shown  in  Figure  5.1.  Blocks  19  to  21  are  the 


external  blocks  constructed  for  this  invocation. 


Let  c  .  1 
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denote  the  control  item  representing  the  predicate  (disc  0). 
The  block  error  ch ar acter  i  st  i  cs  of  the  three  blocks  identified 
by  the  intramodule  error  flow  construction  process  are  given  as 
f o 1  lows: 

cc 19]  =  0;  pc 1 9 ]  =  <  c.i  >;  fmc 19] c c . l )  =  0; 

c c 20 ]  =  0;  pc 20]  =  <  c.i  >;  fmc 20 ] ( c . l )  =  0; 

cc 21  ]  =  0;  pc 21  ]  =  c  c.i  >;  FMC2i](c.i>  =  0; 

Based  on  the  module  error  char acter  ist  ics  of  the  procedure 

rroots  described  in  Figure  5.4/  the  error  characteristics  of 

the  three  blocks  can  be  updated  as  follows: 

CC 19]  =  <  rrootsdisc/  rrootsxl  >; 

PC19]  =  <  disc/  xl  /  c.i  >; 

FMCl93(disc)  =  <  rrootsdisc  >;  FMCl93(xl)  =  C  rrootsxl  >; 
FMC 193 (c.i)  =  CC 193 . 

CC 20]  =  C  xrlz  xr2/  xi  >; 

PC 203  =  t  rrootsdisc/  rrootsxl/  c.i  >; 

FMC20] (rrootsdisc)  =  FMC203 (rrootsxl )  =  C  xrl/  xr2  >; 

FMC  20] (c.i)  =  CC  20] . 

cc2i ]  =  0;  pc 21 ]  =  <  c.i  >; 

FMC  21 ] ( c . 1  )  =  CC  21 3 . 

5.1.3  Logical  R i op  1 e  Effect  Identification 

In  this  section/  the  identification  of  the  logical  ripple 
effect  of  an  initial  program  modification  is  described.  The 
logical  ripple  effect  can  be  identified  in  two  steps.  The 
first  step  is  the  error  flow  tracing  step  which  traces  the 
error  flow  in  the  modified  program  implicated  by  the  primary 
error  sources.  The  second  step  is  the  logical  ripple  effect 
derivation  step  which  derives  the  logical  ripple  effect  of  the 


initial  program  modification  based  on  the  error  flow  in  the 
program. 

5 . 1 . 3 . 1  Error  Flow  Tricing 

The  error  flotu  tracing  requires  the  tracing  of  error  flotu 
both  utithin  modules  and  betuieen  modules.  Potential  error 
sources  can  propagate  from  a  module  m  to  the  modules  which 
invoke  m,  and  to  the  modules  which  are  invoked  by  m.  When 
there  exists  error  flow  from  module  m  to  the  modules  invoked  by 
m»  error  sources  are  said  to  propagate  in  a  downward  direction 
with  respect  to  module  m.  Similarly,  when  there  exists  error 
flow  from  module  m  to  the  modules  which  invoke  m>  error  sources 
are  said  to  propagate  in  an  upward  direction  with  respect  to  m. 
it  is  apparent  that  the  downward  intermodule  error  flow  with 
respect  to  m  must  be  identified  before  the  upward  intermodule 
error  flow  with  respect  to  m  is  identified;  otherwise,  the 
latter  cannot  be  completely  characterized. 

Let  PRIMESET  be  the  primaru  error  source  set  of  a  program, 
in  which  each  element  <m»  b,  x)  denotes  that  x  is  a  primary 
error  source  at  block  b  in  module  m.  The  error  flow  tracing 
identifies  the  modules,  blocks,  and  items  which  are  implicated 
by  the  error  flow  caused  by  the  primary  error  source  set 


PRIMESET . 
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5.  1 . 3. 2  Intermodule  Error  F 1  oui  Tr  ac  i  nq 

The  existence  of  the  upward  intermodule  error  flow  from  a 
module  n  can  be  identified  as  follows:  A  module  n  can 
propagate  error  sources  upward  to  each  module  which  invokes  n 
via  each  invocation  of  n  if  and  only  if  (MCCn3  f\  ESCvD)  0, 
where  ESCv]  is  the  propagation  error  source  set  of  the  exit 
block  v  of  n  and  MCCn]  is  the  module  source  capable  set  of  n. 
The  elements  of  (MCCn3  H  ESCv3)  are  used  to  update  the 
propagation  error  source  set  of  each  invocation  block 
constructed  for  an  invocation  of  n  such  that  the  error  flow 
implicated  by  these  upward  intermodule  error  sources  can  be 
tr  aced . 

The  presence  of  the  downward  intermodule  error  flow  from  a 
module  m  to  an  invoked  module  n  via  an  invocation  of  n  can  be 
identified  as  follows  :  Suppose  that  a  module  m  is  invoked  in 
a  module  n  and  b  is  the  input  parameter  block  constructed  in  n 
for  this  invocation.  Given  the  propagation  error  source  set 
ESC b J  of  b,  n  can  propagate  potential  errors  to  m  via  this 
invocation  if  (ESCb]  fl  MPCm3)  0,  where  MPCm]  is  the  module 
potential  propagator  set  of  m.  The  elements  of  CESCb]  fl  MPCrn]) 
are  used  to  update  the  propagation  error  source  set  of  the 
entry  block  of  module  m  such  that  the  error  flow  implicated  by 


these  downward  intermodule  error  sources  can  be  traced. 
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5 . 1 . 3 . 3  Err  or  F  1  out  Tracing  Algorithm 

The  areas  in  a  program  which  are  implicated  by  the  error 
flow  in  the  program  is  identified  in  a  stepwise  manner.  The 
primary  error  source  set  PRIMESET  is  used  to  initialize  the 
propagation  error  source  sets  and  the  initial  error  source 
block  sets.  The  intramodule  error  flow  in  the  modules  involved 
in  initial  modification  is  then  traced  based  on  the  initial 
propagation  error  source  sets  of  the  blocks  in  the  modules. 
After  the  ;ntramodule  error  flow  in  a  module  m  stabilizes/  the 
intermodule  error  flow  originating  at  m  implicated  by  the  error 
flow  is  then  identified  based  on  these  propagation  error  source 
sets/  and  used  to  update  the  propagation  error  source  sets  of 
the  blocks  in  the  modules  to  which  the  intermodule  error  flow 
is  propagated.  The  modules  which  are  implicated  by  the 
intermodule  error  flow  are  then  analyzed.  This  process 
continues  until  the  error  flow  stabilizes/  i.e.  no  new  error 
sources  are  identified. 

An  algorithm  has  been  developed  for  identifying  the 
program  areas  which  are  implicated  by  the  error  flow  caused  by 
the  primary  error  source  set  PRIMESET.  Let  AFFECTM  be  the  set 
of  modules  which  are  implicated  by  the  error  flow.  In  this 
algorithm,  a  set  UPM  is  used  to  contain  the  modules  potentially 
affected  by  the  upward  intermodule  error  flow/  and  a  set  DOUNM 
is  used  to  contain  the  modules  potentially  affected  by  the 
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downward  intermodule  error  flow.  This  algorithm  is  given 
below. 

Algorithm  5.3.  Error  Flow  Tracing 

Step  1.  Initialize  the  sets  AFFECTM  and  UPM  all  to  be  empty. 
For  each  module  m  in  the  program,  initialize  the  set  IBCmD  to 
be  empty.  For  each  block  b  in  the  program,  initialize  the  set 
ESC b I  to  be  empty. 

Step  2.  For  each  element  (m,  b,  x)  of  the  set  PRIMESET,  insert 
x  into  the  set  ESCbJ.  Furthermore,  insert  b  into  IBCml,  and  m 
into  the  sets  AFFECTM  and  UPM. 

Step  3.  If  UPM  is  empty,  then  terminate.  Otherwise,  select  a 
module  from  UPM  and  delete  it  from  UPM.  Let  n  denote  the 
selected  module. 

Step  4.  Identify  the  intramodule  error  flow  in  n  utilizing 
Algorithm  5.1. 

Step  5.  Calculate  T  =  (ESCvI  fl  MCCnl),  where  v  is  the  exit 
block  of  n.  If  T  is  not  empty,  then  for  each  invocation  block 
b  in  a  module  k  constructed  for  an  invocation  of  n,  check  if  T 
is  a  subset  of  the  propagation  error  source  set  ESCbl  of  b.  If 
it  is  not,  i.e.  new  error  sources  flow  out  of  n  upward  to  k 
via  this  invocation,  then  insert  k  into  AFFECTM  and  UPM,  and  b 
into  IBCkl.  Furthermore,  let  ESCbl  =  CESCb]  OT). 

Step  6.  Let  DOWNM  =  0.  Then,  for  each  input  parameter  mapping 


block  b  in  n  for  an  invocation  of  some  module  m,  calculate  a 
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set  T  =  CEISCb]  A  MPC ml  )  .  Check  if  T  is  a  subset  of  the 
propagation  error  source  set  ESCuD  of  the  entry  block  u  of 
module  m.  If  it  is  not,  i.e.  neui  error  sources  flow  into  m» 
then  insert  m  into  AFFECTM  and  DOWNM,  and  u  into  IBCml. 
Furthermore,  let  ESCu3  =  (ESCul  U  T). 

Step  7.  If  DOWNM  is  not  empty,  then  select  a  module  from  DOWNM 
and  delete  it  from  DOWNM.  Let  j  denote  the  selected  module. 
Repeat  Steps  4  and  6  with  j  substituting  n  to  trace  the 
intramodule  error  flow  in  j  and  the  downward  intermodule  error 
flow  propagated  from  j.  This  process  continues  until  the  set 
DOWNM  becomes  empty.  i.e.  the  error  flow  implicated  by  the 
downward  intermodule  error  flow  originating  at  n  stabilizes. 
Then  go  to  Step  3  to  trace  the  error  flow  implicated  by  the 
upward  intermodule  error  flow  from  the  modules  in  the  set  UPM. 

The  proof  that  Algorithm  5.3  correctly  identifies  the 
areas  in  a  program  which  are  implicated  by  the  error  flow  in 
the  program  caused  by  the  primary  error  source  set  PRIMESET  is 
given  in  CHSIE823.  Now,  let  us  give  the  following  example  to 
illustrate  this  algorithm. 

Examp  le  5.5.  Consider  the  program  shown  in  Figure  5.1.  Assume 
that  the  initial  modification  corrected  the  definition  of  xi  in 
the  procedure  rroots,  i.e.  the  primary  error  source  set  of  the 
program  is  given  by  PRIMESET  =  <  (rroots,  5,  xi)  J.  The  error 
flow  tracing  by  Algorithm  5.3  is  illustrated  in  Figure  5.5. 
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Step  1. 

Step  2. 

Step  3. 


Step  5. 


Step  S. 


Let  AFFCCTM  :  0/.  and  UPM  =  0. 

Let  CSC i 3  be  empty*  for  each  block  i. 


Let  CSC 53  :  {  xi  ),  IBCrroots3  =  t  5  >,  and 
UPM  =  <  rroots  >,  AFFCCTM  ;  C  rroots  >. 


Select  rroots  from  UPM*  and  then  let  UPM  =  0. 


Since  ( CSC 63  MCCrroots3)  :  {  xi  }<  let 
UPM  =  <  roots  >*  AFFCCTM  =  <.  rroots*  roots  >* 
CSC203  :  {  xi  },  and  IBCroots3  =  <.  20  > . 


Since  rroots  has  no  immediate  successors* 
go  to  Step  3. 


Step  5.  Since  ( CSC 253  MCCroots3>  :  {  xi  ),  let 
UPM  =  <  example  >*  IBCexample3  :  C  30  >» 
AFFCCTM  s  {  rroots*  roots*  example  >,  and 
CSC303  :  {  xi  >. 


Step  6.  Since  no  downward  intermodule  error  flow  from  roots* 
go  to  Step  3. 


Step  3.  Select  example  from  UPM*  and  then  let  UPM  s  0. 


Step  5.  Since  no  upward  intermodule  error  flow  from  example* 
UPM  =  0. 


Step  6.  Since  no  downward  intermodule  error  flow  from  example* 
go  to  Step  3. 


Step  3.  Since  UPM  is  empty*  terminate. 

The  result  of  error  flow  tracing  is  as  follows: 
AFFCCTM  =  C  example*  roots*  rroots  >* 


Figure  5.5,  Crror  flow  tracing  in  the  program  shown  in 
Figure  5.1. 
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5 . i . 3 . 4  Logical  k  i  pp  1  e  Effect  Per i vat  i  on 

We  mill  use  RIPPLECb]  to  denote  the  set  of  items  in  a 
block  b  rnhich  are  affected  by  the  logical  ripple  effect, 
RIPPLEBC  m]  the  set  of  blocks  in  a  module  m  rnhich  are  affected 
by  the  logical  ripple  effect,  and  RIPPLEM  the  set  of  modules  in 
a  program  rnhich  are  affected  by  the  logical  ripple  effect. 

To  derive  the  logical  ripple  effect  from  the  error  flow, 
it  is  first  observed  that  a  block  may  not  be  affected  by  the 
logical  ripple  effect  even  though  the  propagation  error  source 
set  resulted  from  the  error  floui  is  not  empty.  This  can  be 
true  if  the  elements  in  the  propagation  error  source  set  are 
error  sources  which  just  pass  through  this  block.  Furthermore, 
it  can  easily  be  shouin  that  an  item  x  is  affected  by  the 
logical  ripple  effect  in  block  b  only  if  x  €  CESCb]  H  CCb]). 

Given  the  set  AFFECTM  and  the  propagation  error  source 
set,  mhich  are  derived  in  the  error  floui  tracing  step,  the 
first  step  in  logical  ripple  effect  derivation  is  as  folloms: 
For  each  module  m  in  the  set  AFFECTM,  first  initialize 
RIPPLEBCml  to  be  an  empty  set.  Then,  for  each  block  b  in  m, 
check  if  (ESCbl  H  CCb])  is  empty.  If  it  is  not,  then  let 


RIPPLECb] 


( ESC  b]A  CCbD),  and  insert  b  into  RIPPLEBCml. 
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Next,  it  is  observed  that  a  module  m  may  not  be  affected 
by  the  logical  ripple  effect  despite  the  fact  that  m  is 
implicated  by  the  error  flow.  This  can  happen  uihen  all  the 
error  sources  in  m  are  just  passing  through  m  to  the  modules 
invoked  by  m  without  internally  generating  error  sources  in  m. 
It  is  obvious  that  a  module  m  is  not  affected  by  the  logical 
ripple  effect,  if  all  the  blocks  in  the  set  RIPPLEBCml  are 
external  blocks.  Therefore,  the  next  step  in  the  logical 
ripple  effect  derivation  is  to  identify  the  subset  RIPPLEM  of 
the  modules  in  the  set  AFFECTM  uihich  have  at  least  one  local 
block  in  their  RIPPLEBCm]  sets. 

In  an  analogous  manner,  a  block  b  with  a  nonempty  set 
RIPPLE! b I  may  not  be  affected  by  the  logical  ripple  effect,  if 
the  block  is  an  external  block  constructed  for  an  invocation  of 
a  module  uihich  is  not  an  element  of  the  set  RIPPLEM. 
Therefore,  the  final  step  in  the  logical  ripple  effect 
derivation  is,  for  each  module  m  in  the  set  RIPPLEM,  to  remove 
the  blocks  b  in  the  set  RIPPLEBCm!  which  are  constructed  for 
invocations  of  modules  which  are  elements  of  the  set  (AFFECTM  - 


RIPPLEM) . 
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Now,  trie  set  RIPPLEM  gives  the  set  of  modules  in  a  program 
which  are  affected  by  the  logical  ripple  effect.  For  each 
module  m  in  the  set  RIPPLEM,  the  set  RIPPLEBCb]  gives  the  set 
of  blocks  in  m  which  are  affected  by  the  logical  ripple  effect. 
For  each  block  b  in  the  set  RIPPLEBCm],  the  set  RIPPLECb]  gives 
the  set  of  error  sources  in  b  which  may  cause  logical 
inconsistencies  with  the  initial  modification. 


Ex  amp  1 e  5.6.  Consider  the  program  shown  in  Figure  5.1.  The 
result  of  error  flow  tracing  in  the  program  has  been  shown  in 
Figure  5.5.  Since  the  only  block  in  the  procedure  roots  is  an 
external  block,  the  procedure  roots  is  not  affected  by  the 
logical  ripple  effect.  Hence,  the  procedure  roots  is  not 
included  in  the  set  RIPPLEM.  Furthermore,  block  30  in  the  main 
module  example  is  constructed  for  an  invocation  of  the 
procedure  roots  which  is  not  affected  by  the  logical  ripple 
effect.  Hence,  block  30  is  eliminated  from  the  set 
RIPPLEBC ex  amp  lei .  The  logical  ripple  effect  is  thus  given  as 
f o 1  lows : 

RIPPLEM  =  {  example,  moots  >. 

RIPPLEBC  example]  =  C  32  > . 

RIPPLEBC  moots]  -  C  5  >  . 

RIPPLEC  32]  -  C  output  >. 

RIPPLEC 5]  =  C  xi  >. 
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In  this  section/  a  scheme  to  identify  the  logical  ripple 
effect  based  on  the  set  of  primary  error  sources  has  been 
presented.  Note  that  this  scheme  illustrates  the  concept  of 
logical  ripple  effect  identification.  A  more  efficient 
algorithm  can  be  found  in  CHSIE823. 

The  intramodule  error  flotu  model/  the  intermodule  error 
flow  model/  and  the  logical  ripple  effect  identification  scheme 
together  provide  a  model  based  on  which  the  logical  ripple 
effect  can  be  identified.  In  the  next  section/  the  overall 
logical  ripple  effect  analysis  technique  will  be  presented. 

5.1.4  Logical  Ripple  Effect  An  a  1  us i s  Techn i que 

The  logical  ripple  effect  analysis  technique  can  now  be 
summarized  as  follows: 

Step  1.  Construct  the  intramodule  error  flow  model  as 

described  in  Section  5.1.1. 

Step  2.  Construct  the  intermodule  error  flow  model  as 

described  in  Section  5.1.2. 

Step  3.  Identify  the  primary  error  source  set  PRIME5ET  based 
on  the  initial  program  modification. 

Step  4.  Identify  the  logical  ripple  effect  of  the  initial 

program  modification  as  described  in  Section  5.1.3. 
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Steps  1.  2*  and  4  of  the  logical  ripple  effect  analysis 
technique  can  be  automated  without  difficulty.  Hou»ever>  the 
identification  of  primary  error  sources  is  more  complicated* 
and  the  automation  of  this  process  is  not  simple.  We  will  now 
discuss  this  step  in  more  detail. 

The  primary  error  sources  are  identified  to  transform  the 
initial  program  modification  into  the  changes  to  the  error  flow 
of  a  program.  To  illustrate  the  i dent i f i c at i on  of  the  primary 
error  sources*  let  us  consider  the  following  types  of  initial 
program  modifications: 

(1)  Suppose  that  a  control  condition  was  modified  by  changes 
to  the  data  usages*  relational  operators*  or  constants  in  this 
control  condition.  The  control  definition  associated  with  this 
control  condition  is  then  specified  as  a  primary  error  source 
at  the  block  which  contains  the  control  condition. 

(2)  Suppose  that  a  data  definition  was  changed  or  added  in  a 
block.  The  definition  is  then  specified  as  a  primary  error 
source  at  the  block. 

(3)  Suppose  that  a  data  definition  was  deleted.  The 
definition  is  then  specified  as  a  primary  error  source  at  the 
block  to  which  the  original  definition  transferred  control. 
Furthermore*  if  any  definition  in  the  block  is  defined  with  a 
usage  of  the  deleted  data  definition*  then  the  definition  is 
also  specified  as  a  primary  error  source  at  the  block. 
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(4)  Suppose  that  an  actual  parameter  x  was  replaced  by  y  in  a 
module  invocation.  If  the  corresponding  formal  parameter  f  is 
an  input  parameter,  then  f  is  specified  as  a  primary  error 
source  at  the  input  parameter  mapping  block  for  this 
invocation.  If  f  is  an  output  parameter,  then  x  and  y  are  both 
specified  as  primary  error  sources  at  the  output  parameter 
mapping  block  for  this  invocation. 

(5)  Suppose  that  a  module  invocation  which  invokes  a  newly 
added  or  an  existing  module  was  inserted  into  the  program.  The 
elements  of  the  module  source  capable  set  of  the  invoked  module 
are  then  specified  as  primary  error  sources  at  the  invocation 
block  for  this  newly  added  module  invocation. 

(6)  Suppose  that  a  module  invocation  n  was  deleted  from  a 
module  m.  The  elements  of  the  module  source  capable  set  of  the 
invoked  module  with  the  formal  output  parameters  substituted  by 
their  corresponding  actual  parameters  in  the  deleted  module 
invocation  are  then  specified  as  primary  error  sources  at  the 
block  to  which  the  deleted  module  in vo cation  transferred 
control  . 

(7)  Suppose  that  an  unconditional  goto  statement  si  which 
branches  to  a  statement  s2  was  deleted  from  the  program.  Let 
s 3  be  the  statement  which  followed  the  statement  si  in  the 
original  program.  The  data  definitions  which  could  reach  s2 
before  the  deletion  of  si  but  cannot  reach  s2  after  the 


deletion  of  si  are  identified  as  primary  error  sources  flowing 
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into  the  statement  s2.  Furthermore/  the  data  definitions  which 
could  not  reach  s3  before  the  deletion  of  si  but  can  after  the 
deletion  of  si  are  identified  as  primary  error  sources  flowing 
into  the  statement  s3. 

Our  current  logical  ripple  effect  analysis  technique 
requires  the  maintenance  programmers  manually  identify  the 
primary  error  sources.  Further  work  is  needed  to  automate  this 
process . 

5-1-5  Experiments 

A  prototype  system  to  perform  logical  ripple  effect 
analysis  on  PASCAL  programs  has  been  developed.  This  system 
consists  of  three  subsystems:  an  intramodule  error  flow 
analyzer#  an  intermodule  error  flow  analyzer/  and  a  logical 
ripple  effect  identification  subsystem.  The  identification  of 
primary  error  sources  should  be  performed  manually  by  the 
maintenance  programmers. 

The  intramodule  error  flow  analyzer  -s  developed  by 
modifying  an  existing  standard  PASCAL  compiler/  while  the  other 
two  subsystems  are  newly  developed.  The  prototype  system  is 
currently  running  on  a  DEC  VAX-11/7B0  computer  under  the  UMS 
operating  system.  The  system  is  primarily  written  in  UAX-11 
PASCAL/  while  some  file  handling  routines  are  written  in  UAX-11 
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FORTRAN.  The  intramodule  error  floui  analyzer  and  the 
intermodule  error  flow  analyzer  are  run  in  batch  mode/  while 
the  logical  ripple  effect  identification  subsystem  can  be  run 
in  either  batch  or  interactive  mode.  The  program  sizes  of  the 
intramodule  error  flow  analyzer/  intermodule  error  flow 
analyzer/  and  logical  ripple  effect  identification  subsystem 
are  643/  190/  and  230  disc  blocks/  respectively/  where  each 
disc  block  under  the  UMS  operating  system  consists  of  512 
bytes  . 

During  the  logical  ripple  effect  identification  step/  the 
user  can  specify  the  modules  whose  internal  error  flow  will  not 
be  traced.  For  such  a  module/  the  upward  error  flow 
originating  at  this  module  will  still  be  traced/  but  the 
downward  error  flow  originating  at  this  module  will  not  be 
traced.  Also/  during  interactive  logical  ripple  effect 
identification/  the  user  can  remove  an  item  from  the  error  flow 
at  a  block  such  that  further  error  flow  implicated  by  this  item 
would  not  be  traced.  This  feature  enables  the  user  to  control 
the  scope  of  error  flow  tracing.  For  example/  he  can  choose  to 
trace  only  the  intramodule  error  flow  of  a  module  which  is 
involved  in  an  initial  program  modification.  Also/  it  can  be 
used  to  reduce  the  scope  of  error  flow  tracing/  and  hence 
provide  the  user  with  more  precise  information  about  the 
potential  logical  inconsistencies.  One  example  of  a  module 
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which  is  not  traced  can  be  an  output  routine  which  converts  a 
data  item  from  one  format  to  another#  while  the  routine  itself 
is  not  modified.  There  are  certain  messages  displayed  on  the 
terminal  which  can  help  the  user  better  understand  the  error 
flow  in  the  program  implicated  by  the  initial  program 
mod i f i c  at  i  on . 

We  have  applied  our  logical  ripple  effect  analysis 
technique  on  PASCAL  programs  with  sizes  ranging  from  about  50 
to  5000  lines  of  program  statements  and  declarations.  Based  on 
our  experiments#  the  execution  time  needed  for  the  error  flow 
analysis  of  a  program  depends  on  the  program  size.  However# 
the  response  time  for  the  interactive  logical  ripple  effect 
identification  is  not  significantly  affected  by  the  program 
size#  but  by  the  size  and  complexity  of  the  modules  in  the 
program  because  the  logical  ripple  effect  identification  is 
performed  on  a  modu 1 e-by-modu 1 e  basis. 

Our  experiment  indicates  that  our  logical  ripple  effect 
analysis  technique  can  be  very  effective  for  scientific 
programs#  which  require  extensive  numerical  computation.  The 
logical  ripple  effect  of  an  initial  program  modification 
follows  very  closely  the  data  flow  in  this  type  of  program. 
For  other  types  of  program  the  effectiveness  of  this  technique 
is  limited  by  the  underlying  data  flow  analysis  technique.  For 
example#  since  the  data  flow  analysis  cannot  distinguish 
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distinct  components  of  a  complex  data  structure#  the  whole  data 
structure  is  treated  as  modified  if  a  particular  component  in 
the  data  structure  is  modified.  This  implies  that  all  the 
program  blocks  which  use  different  components  of  the  data 
structure  would  be  identified  as  affected  by  the  ripple  effect 
of  the  modification  to  a  particular  component  in  the  data 
structure . 

5.1.6  Discuss  ion  And  Future  Work 

Our  current  logical  ripple  effect  analysis  technique 
requires  the  maintenance  programmer  manually  identify  the 
primary  error  sources  and  requires  the  program  to  be  reanalyzed 
after  each  initial  program  modification  to  construct  the  error 
flow  model  of  the  modified  program.  The  efficiency  and  ease  of 
use  of  the  logical  ripple  effect  analysis  technique  can  be 
improved  by  developing  a  scheme  which  can  incrementally  update 
the  error  flow  model  of  the  program  and  a  scheme  which  can 
automatically  identify  the  primary  error  sources. 

The  error  flow  model  of  a  program  can  be  incrementally 
updated  in  two  steps:  updating  the  intramodule  error  flow 
model  and  the  intermodule  error  flow  model.  Since  the 
intramodule  error  flow  model  can  be  constructed  by  an  extended 
parser  of  the  source  language#  the  changes  to  the  intramodule 
error  flow  model  can  be  identified  by  an  extended  incremental 
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attributed  grammar  evaluator  CDEME81D  which  performs 
incremental  attribute  reevaluation.  An  incremental  attribute 
grammar  evaluator  can  function  together  with  a  syntax-directed 
editor  which  can  i ncr ement a  1 1 y  reevaluate  the  syntactic 
information  of  the  program.  The  intermodule  error  flow  model 
can  then  be  incrementally  updated  by  modifying  the  construction 
step  of  the  intermodule  error  flow  model  to  eliminate  the 
analysis  of  the  module  error  characteristics  of  a  module  if  the 
module  and  each  successor  of  the  module  is  not  involved  in  the 
initial  program  modification. 

5 . 2  The  Performance  R i do  1 e  Effect  An  a  1  us i s  Techn i aue 

Since  a  large-scale  program  usually  possesses  both 
functional  and  performance  requirements/  the  ripple  effect  of 
program  modifications  must  be  analyzed  from  both  a  functional 
and  a  performance  point  of  view.  In  many  large-scale  programs# 
the  violation  of  a  performance  requirement  is  equivalent  to  a 
system  error  and  thus  requires  further  corrective  action 
t  B0YD78] #  CWEGN78],  [ 5WAN763  #  C  BELF773 .  Consequently#  in  the 
maintenance  process  it  is  important  to  fully  understand  the 
potential  effect  of  a  modification  to  the  system  in  terms  of 
the  performance  of  the  parts  of  the  system  directly  involved  in 
the  modification#  as  well  as  those  that  may  be  affected 
indirectly.  The  change  in  performance  of  these  parts  may  then 


have  an  impact  on  the  performance  of  the  other  parts  of  the 


system. 

In  the  previous  contract*  we  developed  a  performance 
ripple  effect  analysis  technique  which  was  reported  in  detail 
in  C YAU80C *  80f3.  This  technique  is  based  the  identification 
of  performance  attributes*  critical  sections*  performance 
propagation  mechanisms*  interdependency  relationships  among 
modules  as  well  as  the  relations  between  user  performance 
requirements  and  module  performance  attributes.  Algorithms  for 
identifying  those  items  have  been  established  and  an  algorithm 
for  tracing  the  performance  ripple  effect  has  been  established. 
During  this  project  period*  we  have  constructed  a  prototype 
system  for  the  demonstration  of  our  performance  ripple  effect 
analysis  technique.  In  the  following  section*  we  will  discuss 
our  experimental  results. 

5.2.1  Exper i ment  at i on 

This  prototype  system*  which  has  been  developed  to 
demonstrate  our  performance  ripple  effect  analysis  for  PASCAL 
CJENS74]  programs*  is  made  up  of  two  subsystems:  a  program 
text  analyzer*  which  constructs  a  model  of  the  program  for 
tracing  performance  ripple  effects*  and  a  performance  ripple 
effect  tracing  subsystem.  Since  PASCAL  programs  involve  no 
concurrent  operations*  not  all  of  the  performance  attributes* 
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critical  sections/  virtual  performance  attributes  ana  the 
relationships  among  them  could  be  shown.  Therefore/  the 
program  text  analyzer  constructs  only  those  portions  of  the 
model  which  are  relevant  to  PASCAL  programs/  although  the 
subsystem  to  trace  the  effects  of  program  changes  can  also 
trace  these  effects  on  programs  which  include  those  portions  of 
the  model  which  are  associated  with  concurrent  operations. 

The  program  text  analyzer  was  developed  by  modifying  an 
existing  PASCAL  compiler  and  consists  of  over  7000  lines  of 
PASCAL  code/  while  the  tracing  subsystem  was  newly  developed 
and  consists  of  about  2500  lines  of  PASCAL  code.  The  prototype 
system  is  currently  running  under  the  UMS  operating  system  on  a 
DEC  UAX-li/^sa  computer.  This  system  is  written  entirely  in 
UAX-ll  PASCAL.  Both  subsystems  run  without  user  interaction/ 
the  first  constructing  the  performance  ripple  effect  model  of 
the  program/  and  the  second  tracing  the  effects  of  a 
modification  through  the  entire  program. 

Since  the  logical  correctness  of  a  software  system  is  at 
least  as  important  as  its  ability  to  meet  performance 
requirements/  we  will  assume  that  an  analysis  of  logical  ripple 
effects  precedes  that  of  performance  ripple  effects.  This 
allows  us  to  take  advantage  of  the  data  flow  analysis  performed 
by  the  logical  ripple  effect  identification  subsystem. 
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To  assist  us  in  validating  the  results  of  our  performance 
ripple  effect  analysis  we  developed  a  technique  to  estimate  the 
execution  time  of  arbitrary  paths  in  the  programs  being 
modified.  This  technique  was  described  in  CYAUBlb].  Me 
compared  the  estimated  execution  times  of  all  critical  sections 
of  the  program*  before  and  after  the  modification*  and  observed 
that  all  quantitative  changes  in  estimated  times  appeared  in 
critical  sections  that  mere  implicated  by  our  performance 
ripple  effect  analyzer. 

5.2.2  Discuss  ion 

During  the  early  stages  of  the  maintenance  process*  the 
performance  ripple  effect  analysis  technique  can  be  used  as  an 
aid  in  developing  criteria  for  maintenance  personnel  to 
evaluate  proposed  program  modifications  from  a  performance 
perspective.  Basically*  this  involves  the  uiorst-case 
identification  of  performance  requirements  which  might  be 
affected  by  the  program  modifications. 

After  a  program  modification  has  been  selected  and 
completely  implemented*  the  performance  ripple  effect  analysis 
technique  can  substantially  refine  its  analysis  and  determine 
more  accurately  which  performance  requirements  may  have  been 
affected  by  the  program  modifications.  These  performance 


requirements  can  then  become  the  targets  for  retesting.  This 
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is  accomplished  by  determining  uihether  or  not  a  performance 
attribute  is  actually  affected  before  implicating  other 
performance  attributes  involved  in  a  performance  dependency 
relationship  with  the  given  attribute.  In  other  words*  if  a 
dependency  relationship  exists  between  performance  attributes  x 
and  y*  performance  attribute  y  does  not  need  to  be  examined  for 
changes  if  it  has  been  determined  that  performance  attribute  x 
is  not  affected  by  the  maintenance  activity.  Thus*  the 
preliminary  results  of  some  of  the  early  retesting  efforts  may 
be  decisive  in  determining  the  scale  of  retesting  which  remains 
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6.0  EFFECTIUE  TESTING  FOR  SOFTWARE  MAINTENANCE 

Despite  the  use  of  automated  tools  to  assist  the 
maintenance  programmer  in  making  modifications  correctly/  the 
possibility  of  error  remains/  and  so  the  modified  program  must 
still  be  retested.  Nonetheless/  uie  mould  like  to  avoid 
retesting  the  entire  program  if  only  a  minor  modification  has 
been  made. 

Me  have  developed  a  module  testing  approach  rnhich  makes 
use  of  existing  test  cases  whenever  possible/  and  uses  the 
input  partition  method  CRICH813  for  constructing  nem  test  cases 
mhen  they  are  required.  Actual  testing  is  done  by  symbolic 
execution  CKING763/  but  me  make  use  of  real  test  case  data  to 
select  the  control  flom  paths  to  be  executed.  Me  have 
demonstrated  an  implementation  of  our  approach  using  ANSI 
FORTRAN  by  modifying  the  ATTEST  system  CCLAR763.  Our  method  is 
effective  in  testing  programs  mith  mathematical  computations 
mhose  specifications  can  be  given  in  the  cause/effect  manner 
t  MYER76I /  C G00D753 ,  CHALL783,  CHENI803,  such  as  control 
programs  for  aircraft  control  systems  and  nuclear  porner  plant 
control  systems.  Although  our  method  has  been  demonstrated  for 
programs  mritten  in  FORTRAN/  it  can  easily  be  modified  for 
programs  mritten  in  block-structured  languages/  such  as  PASCAL/ 
PL'l  and  ALGOL.  The  application  of  this  method  mill  also  be 


d iscussed . 
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6 . 1  The  Module  Re  va 1  i  d  at  i  on  Technique 

In  this  section,  me  mill  present  our  module  revalidation 
technique  during  the  maintenance  phase.  The  smallest  unit  in 
the  program  me  consider  here  as  a  modified  section  is  a  program 
sect  ion  rnhich  is  a  maximal  set  of  ordered  statements  of  a 
program  that  can  only  be  executed  as  fol  louts:  its  execution 
starts  from  the  first  statement,  terminates  at  the  last 
statement,  and  all  of  its  statements  are  executed  in  the  given 
order.  In  addition,  me  assume  that  each  module  in  the  program 
has  one  entry  and  multiple  exits.  Our  technique  is  applied 
only  after  all  the  necessary  modifications  of  the  module  are 
completed.  It  is  assumed  that  the  module  before  the 
modification  has  been  tested  by  the  test  set  T  = 
ftl,  t2,  .  .  . ,  tn>,  inhere  T  mas  generated  by  any  test  generation 
method,  each  test  case  ti  =  <vl , v2, . . . , vm> ,  i=l,2,...n,  and  v  j , 
j  =  1 , 2, .  .  .  >  m,  is  an  input  value  for  the  jth  input  variable  of 
the  module.  Furthermore,  me  assume  that  the  specification  of 
each  module  is  correct  and  given  in  the  form  of  a  cause/effect 
graph  C MYER76 ] . 

The  module  revaluation  technique  can  be  summarized  in  the 
flom-chart  shomn  in  Figure  6.1.  To  start  mith,  the  derivation 
of  the  input  partition  for  the  modified  module  mill  be  done  to 
reflect  the  changes  in  the  program  code  and/or  specification. 
Then,  the  original  test  cases  of  T  rnhich  are  still  correct 
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mr 

Derive  the  input  partition 
for  the  modified  module 

I 

Keep  correct  test  cases  in  T 
and  discard  other  test  cases 

1 

Assign  correct  test  cases  to 
the  input  partition  classes  for 
the  modified  module 


cases  and  previous  test  cases  uihose 
execution  exercises  any  modified 
section  of  the  modified  module 

i 

Output  validation 

1 

Debugg  i  ng 


Figure  6.1  An  overview  of  module  r e va 1 i d at i on . 
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inputs  -for  the  modified  module  are  kept  and  all  other  original 
test  cases  of  T  are  discarded.  If  the  use  of  the  test  cases  in 
T  does  not  satisfy  the  criterion  of  the  input  partition  method/ 
uihich  requires  at  least  one  test  case  in  each  partition  class, 
additional  test  cases  are  generated  to  satisfy  the  criterion. 
After  the  criterion  is  satisfied,  all  the  nemly  generated  test 
cases  and  the  original  test  cases  rnhose  execution  leads  to  any 
modified  portion  of  the  module  are  executed,  and  the  results  of 
the  execution  are  examined.  When  the  existence  of  errors  is 
detected,  debugging  of  the  module  mill  be  performed.  In  the 
remainder  of  this  section,  uie  mill  discuss  each  of  these 
processes  in  detail. 


6.1.1  Per  i vat i on  Of  The  Input  Part  it  ion 


The  input 

part  it  ion  F 

used  in  our  method 

is 

derived 

by 

intersect ing 

tmo  input 

partitions  Ps  and 

Pc, 

mh  i  ch 

are 

generated  from  the  program  specification  and  code  respectively, 
and  our  testing  criterion  is  to  have  at  least  one  test  case  in 
each  partition  class  of  P.  This  input  partition  has  also  been 
considered  by  Ueyuker  and  Ostrand  L WEYU803 ,  mho  used  English 
for  the  program  specification,  and  Richardson  and  Clarke 
CRICH81],  mho  used  a  Program  Design  Language  (PDL)  type 
specification.  As  mentioned  before,  me  used  the  cause/effect 


graph  to  represent  the  program  specification.  The  partition  Ps 


Page  169 


can  be  generated  by  considering  all  possible  combinations  of 
input  conditions  from  the  cause/effect  graph  and  each 
combination  corresponds  to  a  partition  class.  The  partition  Pc 
can  be  generated  by  considering  that  each  distinct  executable 
path  in  a  module  corresponds  to  a  distinct  partition  class? 
except  that  those  paths  which  differ  only  in  the  iteration 
number  of  the  same  loop  belong  to  the  same  class. 

To  illustrate  the  input  partition  method?  let  us  consider 
the  program  which  computes  the  average  of  a  given  array  of 
numbers  and  returns  its  absolute  value.  The  specification  of 
the  program  is  given  in  the  cause/effect  graph  shown  in  Figure 
6.2.  Its  code?  together  with  the  program  graph  information?  is 
shown  in  Figure  6.3.  The  c ause/ef f ect  graph  is  used  to  give 
the  i nput/output  relations  of  the  module.  Circles  on  the  left 
correspcnd  to  causes?  which  denote  input  conditions  ?or  the 
module?  and  circles  on  the  right  correspond  to  effects?  which 
denote  outputs  of  the  module.  Circles  in  the  middle  denote 
intermediate  nodes?  which  are  used  to  specify  combinations  of 
causes  by  means  of  logical  relations?  such  as  (PND)?  (OR)  and 
~(^OT).  Causes  (or  combinations  of  causes)  and  effects  are 
connected  if  there  exist  relations:  if  causes  (or  combinations 
of  causes)  are  given  in  the  module?  the  effects  are  returned  by 


the  module. 
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Program  Block  No. 

1 

St  atement 

No.  SOURCE 

1 

SUBROUTINE  SUB1  CA,I,R,IERR) 

1 

1 

DIMENSION  ACS) 

2 

I ERR  z  0 

2 

3 

IF  (I  .LE.  0) 

13 

4 

*  GO  TO  10 

3 

5 

IF  (I  .LE.  1) 

12 

6 

S  GO  TO  20 

7 

R  =  0 

4 

8 

DO  30  L  :  1,1 

5 

9 

30  R  =  R  +  ACL) 

6 

10 

R  =  R/I 

7 

11 

40  IF  CR  .GT.  0) 

11 

12 

*  60  TO 

8 

13 

R  =  -R 

14 

50  WRITE  (6, 100)  R 

100  FORMAT  CX,  F3.2) 

9 

15 

GO  TO  60 

16 

20  R  =  AC  1 ) 

12 

17 

GO  TO  40 

13 

18 

10  IERR  -  1 

10 


19 


60  RETURN 


represents  a  partition  class  of  Ps  /  and  Ps 


(110)  U  <11115  and 


S>0)  U  <11115  and  S  10).  To  generate  Pc ,  uie  first  find  the 
following  five  different  kinds  of  paths  in  the  program:  the 
path  to  handle  the  case  110/  the  path  to  handle  the  case  1=1 
and  S10/  the  path  to  handle  the  case  that  1=1  and  S>0/  the 
paths  to  h an die  the  case  2<I<5  and  510/  and  the  paths  to  handle 
the  case  21115  and  S>0.  Based  on  this  grouping  of  paths/  Pc 
can  be  derived  as  follows:  Pc  =  (I<0)  U  (2<I<5  and  S>0)  U 


( 211 15 

and  510)  U 

(1  =  1 

and  S>0)  U  (1=1  and  S10). 

By 

taking  P  = 

Ps 

Pc,  we  obtain  all  the  partition 

of  P  as 

follows: 

class  1 . 

110 

class  2. 

1  =  1 

and  S>0 

class  3 . 

1  =  1 

and  S10 

class  4 . 

2< I < 5  and  S>0 

class  5 . 

2< I < 5  and  S<0 

6.2  Reus ab i 1 i tu  Of  Original  Test  Cases 

The  changes  made  to  the  module  may  make  the  application  of 
the  original  test  cases  to  the  modified  module  invalid.  Hence/ 
it  is  necessary  to  determine  whether  the  original  test  cases 
are  correct  inputs  to  the  modified  module.  To  do  this/  we  need 
to  examine  the  total  number  of  input  values  necessary  to  invoke 
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the  modified  module  and  the  order  of  these  values.  For 
example*  because  of  the  modification,  another  input  may  be 
needed  to  invoke  the  modified  module  correctly.  In  such  a 
case,  after  the  modification,  the  original  test  cases  are  no 
longer  valid  to  test  the  modified  module  and  must  be  discarded. 
In  the  case  of  a  modification  for  error  correction,  the  input 
which  detected  the  existence  of  errors  should  be  included  in  T. 

6.3  Ass i gnment  Of  Or i a i n a  1  Test  Set  To  The  Input  Partition 

Classes 

When  original  test  cases  are  correct  inputs  to  the 
modified  module,  we  should  use  them  in  order  to  generate  fewer 
new  test  cases.  This  is  done  in  our  method  because  it  is  much 
easier  to  see  if  a  given  test  case  satisfies  a  given  input 
class  domain  than  to  generate  a  new  test  case  which  satisfies  a 
given  input  class  domain.  As  long  as  some  t  in  T  which 
satisfies  the  domain  constraint  of  the  jth  partition  class,  we 
assign  it  to  the  jth  partition  class.  A  similar  idea  was  also 
used  in  CASEGEN  CRAMA76],  To  illustrate  this,  let  us  consider 
a  set  of  original  test  cases  shown  in  Figure  6.4  for  the 
program  shown  in  Figure  6.3.  Since  test  case  1  satisfies  the 
domain  constraint  of  partition  class  1,  we  assign  test  1  to 
partition  class  1.  Similarly,  test  cases  2,  3  and  4  are 


assigned  to  partition  classes  2,  5  and  4  respectively. 


Test 

case 

z 

( 

I# 

A  (  1  )  # 

A  (  2  )  # 

A  (  3  )  # 

A  (  4  )  # 

A  ( 5 ) 

> 

Test 

case 

1  : 

( 

e# 

a.  a# 

a. a# 

a.  a. 

a. a# 

a. a 

) 

Test 

c  ase 

2  = 

( 

i# 

4.3# 

a. a# 

a. a# 

a.  a# 

a.  a 

> 

Test 

case 

3  = 

( 

2# 

-7.89, 

2.  a# 

a.  a# 

a.  a# 

a. a 

> 

Test 

case 

4  = 

( 

3# 

* 

CD 

in 

• 

H 

6.32# 

-7.34, 

a. a# 

a.  a 

) 

Figure  6.4.  Original  test  cases  prepared  for  the  program  in 
Figure  6.3. 

6 . 4  Se lect ion  Of  Original  Test  Cases  For  Execution 

Me  only  need  to  execute  those  original  test  cases  which 
exercise  any  modified  program  blocks  of  the  modified  module 
because  execution  of  the  rest  of  the  original  test  cases  will 
only  follow  the  same  sequence  of  the  same  statements  as  they 
did  before  the  module  was  modified  and  the  same  test  execution 
results  are  generated.  Fischer  CFISC773  developed  a  method  to 
select  the  test  cases  whose  executions  exercise  the 
modification/  but  his  method  is  only  applicable  to  those 
modifications  which  do  not  change  the  control  structure  of  the 
program.  In  this  section#  we  will  present  a  heuristic  method# 
which  can  be  applied  to  any  kind  of  modifications#  including 
the  case  where  the  control  structure  of  the  module  is  changed. 
Because  it  is  an  unsolvable  problem  to  determine  which  sections 
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of  a  module  uiill  be  traversed  for  given  test  cases  before  their 
execution*  we  can  determine  whether  a  given  test  case  will 
traverse  any  modified  portion  of  the  module  only  during  or 
after  its  execution.  We  will  first  discuss  what  information  is 
needed  to  select  test  cases*  and  then  present  the  selection 
algor ithm. 

6.4.1  Necessaru  Information  For  Test  Selection 

Let  us  define  a  path  in  a  program  graph  as  a  sequence  of 
nodes  and  branches.  A  module  path  is  a  path  which  starts  from 
a  node  corresponding  to  the  module  entry  and  ends  at  a  node 
correspond  ing  to  a  module  exit.  The  reach ino  set  of  a  node  X 
in  the  program  graph  is  a  set  of  all  possible  paths  that  start 
from  the  entry  node  and  end  at  the  node  X.  The  reaching  set  of 
a  given  node  in  the  program  graph  is  identified  by  using  the 
depth-first  search  algorithm.  We  store  the  reaching  s»t 
information  in  the  program  graph  by  mark ing  every  branch  which 
belongs  to  some  path  in  the  reaching  sets  of  the  modified 
nodes.  The  reaching  set  information  stored  at  each  branch  in 
the  program  graph  is  used  to  select  test  cases. 

We  would  use  the  symbolic  execution  tree  CKING76]  to  keep 
track  of  the  test  execution  information.  Each  node  in  the 
symbolic  execution  tree  corresponds  to  an  execution  of  a 
statement.  We  modify  the  definition  of  the  symbolic  execution 
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tree  so  that  each  node  in  the  tree  corresponds  to  an  execution 
of  a  Dec  is ion-to-Dec  i  s  ion  Path  (DD-path)  [HUAN75D.  A  DD-path 
is  a  path  in  a  flow-chart  which  satisfies  the  following 
conditions:  1)  its  first  edge  starts  either  from  an  entry  node 
or  a  decision  box;  2)  its  last  edge  terminates  either  at  a 
decision  box  or  an  exit  node;  and  3)  there  are  no  decision 
boxes  on  the  path  except  at  both  ends.  This  modification 
reduces  the  storage  requirements  without  losing  the  necessary 
test  selection  information.  At  each  node  of  the  modified 
symbolic  execution  tree*  the  information  called  STATE  is 
stored.  STATE  is  a  triple  <V*  LC*  PC>*  where  U  is  a  vector 
containing  all  the  values  for  the  variables  in  the  program*  LC 
points  to  the  last  statement  of  the  DD-path*  and  PC  stores  the 
constraint  to  execute  the  path  so  far  traced. 

As  the  execution  proceeds*  U*  LC*  and  PC  are  updated  in 
accordance  with  the  result  of  the  execution.  PC  is  originally 
assigned  to  a  value  of  "TRUE"  and  is  updated  whenever  the 
execution  has  gone  through  a  decision  point  and  selected  an 
outcome  of  it.  The  new  PC  is  computed  by  taking  an 
intersection  of  the  old  PC  and  the  constraint  needed  to  be 
satisfied  in  order  to  take  the  selected  outcome.  Therefore*  PC 
stored  at  a  given  node  in  the  symbolic  execution  tree  contains 
the  path  constraint  for  the  path  between  the  root  node  and  this 
node.  Note  that  PC  does  not  change  during  the  execution  of  a 
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DD-path.  On  the  other  hand/  U  may  change  during  execution  of  a 
DD-path.  A  value  of  a  variable  is  changed  when  a  statement 
assigning  some  value  to  this  variable  is  executed.  During  the 
execution  of  a  DD-path/  possibly  existent  assignment  statements 
in  the  DD-path  may  be  executed/  changing  the  values  of 
variables.  U  stored  at  a  node  of  the  symbolic  execution  tree 
contains  the  values  of  variables  after  execution  of  the 
sequence  of  statements  corresponding  to  the  path  stating  from 
the  root  node  and  ending  at  this  node.  Mote  that  the  U  stored 
at  each  node  as  a  part  of  the  STATE  information  is  the  one 
computed  after  the  execution  of  the  last  statement  of  the 
corresponding  DD-path. 

In  addition/  tue  need  a  table  called  test  informat  ion 
table.  This  table  is  needed  to  keep  the  test  selection 
information/  and  it  has  three  columns.  The  first  column  is 
used  to  store  a  test  case  identification/  the  second  column  is 
used  to  store  a  symbolic  execution  tree  node  identification  to 
show  where  the  test  case  specified  in  the  first  column  stopped 
being  executed/  and  the  third  column  is  used  to  store  the 
information  of  whether  the  test  case  specified  in  the  first 


column  was  selected. 
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6.4.2  Oner  v  i  e  w  Of  Selective  Test  Execution 

Our  method  is  developed  by  utilizing  the  reaching  set 
information  and  the  symbolic  execution  tree. 

The  selective  execution  using  the  reaching  set  of  modified 
nodes  can  be  described  as  folio  uis :  any  execution  is  continued 
as  long  as  it  follows  any  path  in  the  reaching  sets  of  modified 
nodes.  This  is  because  such  a  path  eventually  leads  to  the 
execution  to  a  modified  node.  The  execution  is  terminated  as 
soon  as  it  is  found  out  that  the  execution  does  not  follow  any 
path  in  the  reaching  set  of  modified  nodes.  Since  all  the 
branches  belonging  to  the  paths  in  the  reaching  sets  of 
modified  nodes  are  marked,  one  can  tell  whether  or  not  the 
current  execution  still  follows  any  path  in  the  reaching  set  of 
modified  nodes  by  observing  if  the  outcome  selected  by  the 
execution  at  each  decision  point  is  marked.  If  the  execution 
so  far  followed  a  path  in  the  reaching  sets  of  the  modified 
nodes  and  a  marked  branch  is  selected,  we  know  the  execution  is 
still  following  a  path  in  the  reaching  sets  of  the  modified 
nodes.  On  the  other  hand,  if  an  unmarked  branch  is  selected, 
one  can  tell  that  the  execution  no  longer  follows  a  path  which 
leads  to  the  execution  to  a  modified  node.  The  execution  is 
continued  for  the  former  case  while  the  execution  is  terminated 
for  the  latter  case. 
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The  symbolic  execution  tree  is  used  in  order  to  process 
more  than  one  test  case  at  one  time.  The  symbolic  execution 
tree  is  constructed  as  the  data-driven  symbolic  execution 
proceeds.  Initially*  all  test  cases  are  stored  in 
Current-Test-Cases  (CTC)  which  holds  test  cases  relevant  to  the 
execution.  When  a  decision  point  of  the  module  is  encountered* 
uie  must  choose  an  outcome  of  the  predicate  because  not  all  the 
test  cases  in  CTC  necessarily  evaluate  the  predicate  to  the 
same  outcome*  and  not  all  outcomes  lead  to  the  execution  of 
modified  nodes.  Based  on  the  selective  execution  previously 
described*  uie  have  set  the  outcome  selection  criterion  as 
follows:  When  no  modified  node  has  been  traversed  by  the 
execution*  select  an  outcome  whose  constraint  can  be  satisfied 
by  at  least  one  test  case  in  CTC  and  whose  corresponding  branch 
in  the  program  graph  is  marked;  once  one  modified  node  has 
been  traversed*  select  an  outcome  whose  constraint  can  be 
satisfied  by  at  least  one  test  case  in  CTC. 

After  the  outcome  is  determined*  the  test  cases  in  CTC 
which  did  not  select  this  outcome  are  removed  and  stored  in  the 
current  tree  node.  The  execution  is  continued  towards  the 
selected  outcome  of  the  predicate  by  adding  a  new  tree  node. 
Note  that  at  that  time  test  cases  in  CTC  are  the  ones  which 
choose  the  selected  outcome.  When  the  execution  is  terminated 


because  it  has  reached  an  exit  point  of  the  module  or  because 


no  outcome  can  be  selected,  the  test  selection  result  is  stored 


in  the  test  information  table,  using  "selected"  status  for  the 
former  and  "not  selected"  for  the  latter  case.  Then,  the 
symbolic  execution  tree  is  followed  backward  from  the  node 
where  the  execution  was  stopped  to  the  root  node.  When  the 
first  tree  node  containing  test  cases  is  found,  a  new  execution 
is  started  by  assigning  the  test  cases  stored  in  that  node  to 
CTC  and  using  the  STATE  information  stored  in  that  node.  When 
no  such  node  is  found,  the  algorithm  terminates.  Now,  let  us 
present  the  algorithm  to  select  the  test  cases. 

€>■4.3  Algorithm  To  Select  Test  Cases 

Step  1.  Set  CTC  to  the  original  test  cases,  from  which  the 
test  cases  are  selected.  Set  a  counter  for  the  number  of 
traversed  modified  program  blocks,  COUNT  to  0,  the  statement 
pointer  ST  to  the  first  executable  statement,  and  the  tree 
pointer  TN  to  the  root  of  the  symbolic  execution  tree. 

Step  2.  If  a  modif.ed  block  is  traversed,  increment  COUNT  by 
one.  If  ST  is  an  exit  statement,  store  the  test  cases  in  CTC 
in  the  test  information  table  with  the  status  "selected"  and  go 
to  Step  4.  If  ST  is  a  decision  statement,  go  to  Step  3. 
Otherwise,  set  ST  to  the  next  statement  and  repeat  Step  2. 

Step  3.  Select  an  outcome  of  a  decision  statement  based  on  the 
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outcome  selection  criterion  by  using  tests  in  CTC.  If  no 
outcome  can  be  selected,  all  the  test  cases  in  CTC  are  stored 
in  the  test  information  table  with  the  status  "not  selected" 
and  go  to  Step  4.  Otherwise,  store  the  test  cases  in  CTC  which 
do  not  satisfy  the  constraint  of  the  selected  outcome  in  TN  and 
remoue  these  test  cases  from  CTC.  Store  COUNT  in  TN.  Generate 
a  new  tree  node  as  a  successor  of  TN  and  set  TN  to  this  node. 
Set  ST  to  the  next  statement  and  go  to  Step  2. 

Step  4.  Trace  the  tree  from  TN  towards  the  root.  Set  TN  to 
the  first  tree  node  encountered  which  holds  test  cases  and  go 
to  Step  2.  If  no  such  nodes  exist,  terminate. 

6.4.4  Ari  Ex  amp  1  e 

To  illustrate  the  test  selection  algorithm,  let  us 
consider  the  program  shown  in  Figure  6.3  again.  The  program 
graph  for  this  program  with  the  reaching  set  (program  sections 
4  and  12  are  modified'  „  shown  in  Figure  6.5.  The  addition  of 
R  =  0  is  done  in  program  section  4,  and  the  correction  of  "GO  TO 
50"  to  “GO  TO  40"  is  done  in  program  section  12.  The  original 
test  cases  are  shown  in  Figure  6.4.  Figure  6.6  (a)  and  (b) 
show  the  symbolic  execution  tree  and  the  test  information  table 
after  the  execution  is  over.  The  tree  nodes  are  numbered  as 
they  are  generated  and  attached  to  the  tree.  Initially,  CTC 


contains  four  test  cases  1 


2 


3  and  4 


and  th->  symbo  1  i  c 
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Result  of  test  selection  on  the  program  shown  in 
Figure  6.3  with  test  cases  given  in  Figure  6.4: 
<a)  symbolic  execution  tree/  and  (b)  contents  of 
test  information  table. 


point  corresponds  to  a  marked  branch  from  program  section  2  to 
program  section  3  and  this  outcome  can  be  selected  by  test 
cases  2*  3  and  4*  the  False  outcome  is  chosen.  First*  test 
case  1  is  removed  from  CTC  because  it  did  not  choose  the 
selected  outcome  and  gets  stored  at  the  root  node.  A  neui  tree 
node  is  attached  to  the  symbolic  execution  tree.  In  the  same 
way*  at  the  next  decision  point*  IF  (I.LE.l)*  the  False  outcome 
is  selected  while  test  case  2  is  stored  in  the  tree  node  2.  At 
the  next  decision  point.  DO  30  L=1.I,  since  the  modified 
program  section  4  has  been  already  traversed  (COUNT  =  D*  the 
second  part  of  the  outcome  selection  criterion  is  used.  Since 
both  test  cases  3  and  4  select  the  False  outcome*  no  test  case 
is  removed  from  CTC  and  the  new  node  4  is  added  to  the  symbolic 
execution  tree.  Test  cases  3  and  4  do  not  select  the  same 
outcome  at  the  next  decision  point*  DO  30  L=1*I.  The  True 
outcome  is  arbitrarily  selected  and  a  new  node  5  is  attached. 
Test  case  4*  which  selects  the  False  outcome  of  this  decision 
point*  is  removed  from  CTC  and  stored  at  node  4.  Hereafter* 
CTC  contains  only  test  case  3.  When  the  current  execution 
terminates  at  the  RETURN  statement*  the  tree  is  followed 
backward  from  node  6  to  the  root  node.  The  tracing  is 
terminated  at  node  4  because  it  is  the  first  node  containing  a 
test  case*  test  case  4.  A  new  execution  can  be  started  from 
this  node  because  the  STATE  information  contains  the  execution 


information  for  the  sequence  of  statements  correspond  ing  to  a 


path  consisting  of  tree  nodes  1*  2*  3  and  4.  Similarly*  test 
cases  4  and  2  are  executed.  After  the  execution  of  test  case 
2*  the  tree  is  followed  back  to  the  root  node  where  test  case  1 
is  stored.  However*  no  new  execution  is  started  from  the  root 
node.  The  True  outcome  selected  by  test  case  1  corresponds  to 
a  branch  between  program  section  2  and  program  section  13. 
This  branch  is  not  marked  and  no  modified  program  section  has 
been  traversed  (count  stored  in  the  root  node  is  zero)  and 
therefore*  no  outcome  is  selected.  Test  case  1  is  removed  and 
its  status  "not  selected"  is  entered  in  the  test  information 
table. 


6.5  Test  Case  Gener at  ion  And  Execution 

In  order  to  satisfy  the  criterion  of  the  input  partition 
method*  which  requires  at  least  one  test  case  for  each 
partition  class*  it  may  be  necessary  to  generate  additional 
test  cases.  After  the  assignment  of  the  original  test  cases  is 
completed*  we  generate  test  cases  for  partition  classes  which 
do  not  have  any  test  cases.  Note  that  when  none  of  the 
original  test  cases  are  reused*  we  must  generate  a  completely 
new  set  of  test  cases  requiring  the  same  amount  of  effort  to 
test  the  modified  module  as  a  new  module.  In  the  example 


considered  in  Section  6.3*  the  partition  class  3  is  not 
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assigned  any  test  case.  A  test  case  (1/  -2.34/  0/  0/  0/  0)  is 
generated  and  executed  to  satisfy  the  testing  criterion.  The 
method  we  have  developed  has  another  mode  of  execution.  Under 
this  mode/  all  the  test  cases  are  executed  to  the  end.  The 
algorithm  used  for  this  mode  can  be  derived  by  making  a  minor 
modification  to  the  test  case  selection  algorithm. 

Algor ithm  to  execute  test  cases:  We  made  a  modification  to 
Step  1  of  the  Test  Selection  algorithm.  Instead  of  setting 
COUNT  to  0  initially/  it  is  set  to  1.  This  algorithm  executes 
all  the  test  cases.  Since  COUNT  is  always  greater  than  zero/ 
the  outcome  selection  criterion  is  to  select  an  outcome  to 
which  at  least  one  test  case  in  CTC  evaluates  the  encountered 
predicate.  This  guarantees  that  all  the  test  cases  are 
considered  and  none  of  them  are  removed  before  they  reach  the 
exit  of  the  module. 

6.6  Output  Ual idat ion  Phase 

Our  method  performs  data-driven  symbolic  execution  on  the 
target  module  and  produces  symbolic  and  real  outputs/  and  the 
domain  information.  According  to  Howden  CH0WD78D  the  symbolic 
outputs  are  effective  to  detect  computational  errors.  However/ 
symbolic  execution  alone  is  not  effective  to  detect  domain 
errors.  We  can  store  the  domain  information  for  each  partition 
class  in  a  decision  table  and  this  information  can  be  used  to 
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validate  the  domain  information  obtained  by  executing  a 
program.  In  addition  to  the  domain  information*  we  can  add  the 
output  information  for  each  partition  class  in  the  decision 
table.  The  decision  table  for  the  example  program  discussed  in 
Section  6.1  is  given  in  Figure  6.7.  Each  column  oF  the  table 
corresponds  to  a  partition  class*  and  the  upper  half  of  the 
table  is  used  for  storing  the  domain  information  and  the  lower 
half  of  the  table  is  used  for  storing  the  expected  output 
information . 

6.7  Debugging 

The  methgd  we  have  developed  has  a  debugging  capability 
called  test  executign  information  displau.  The  basic  idea  of 
this  cap  bility  comes  from  EXDAMS  CBALZ693  and  ISMS  CFAIR753. 
Since  the  necessary  test  execution  information  is  stored  in  the 
symbolic  execution  tree  in  the  data  base*  we  do  not  have  to 
execute  the  program  again  in  order  to  debug  the  program.  The 
maintenance  programmer  can  follow  a  module  path  forward  or 
backward  easily*  and  retrieve  information  such  as  path 
constraints  and  the  value  of  each  variable  in  symbolic  and  real 
forms  from  the  different  locations  of  the  symbolic  execution 
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Figure  6.7.  The  decision  table  containing  the  partition  classes 
derived  from  specification  in  Figure  6.2  and  the 
program  in  Figure  6.3. 


Note  that  the  sumbo 1 ic  debugger  and  the  symbolic  executor 
are  not  the  same.  Our  method  has  four  different  kinds  of 
commands  which  the  maintenance  programmer  can  use.  The  first 
kind  of  command*  called  test  spec  if  icat ion;  is  used  to  specify 
a  test  case.  The  second  kind  of  command;  called  move  command; 
enables  the  user  to  move  the  pointer  within  the  symbolic 
execution  tree  so  that  the  user  can  retrieve  the  STATE 
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information  from  different  tree  nodes.  The  third  kind  of 
command*  called  shorn  command*  shows  the  necessary  test 
execution  information  that  the  maintenance  programmer  requests 
at  different  locations  of  the  executed  path.  The  last  kind  of 
command*  called  break  point  command,  can  set  the  break  points 
and  stop  the  tracing  at  the  break  points.  Since  the  test 
execution  information  is  already  stored  in  the  data  base* 
additional  commands  which  allow  the  user  to  retrieve  different 
debugging  information  can  be  easily  added  without  making  any 
modification  to  the  program  portions  for  test  execution  of  the 
method . 

6.0  Discussion  And  Future  Uork 

Our  method  employs  the  input  partition  method  for  test 
generation  and  data-driven  symbolic  execution  for  test 
execution.  The  method  has  been  demonstrated  by  implementing 
parts  of  it:  1)  selective  execution  of  the  original  tests*  2) 
test  execution*  and  3)  debugging.  These  parts  have  been 
implemented  in  UAX-FORTRAN*  using  a  DEC  VAX  ll*'780  computer 
under  the  UAX/UMS  operating  system*  and  can  be  used  to  analyze 
programs  written  in  ANSI  FORTRAN. 
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The  revalidation  of  the  program  after  it  is  modified  is 
very  important  in  the  maintenance  phase.  Presently*  no 
systematic  approach  exists  for  revalidating  the  modified 
program  in  the  maintenance  phase.  Our  module  revalidation 
technique  is  developed  to  assist  the  maintenance  programmer  to 
perform  module  revalidation  for  modified  modules.  Me  have  also 
developed  a  set  of  supporting  tools  which  help  the  maintenance 
programmer  to  apply  our  revalidation  technique.  Our  module 
revalidation  technique  uses  the  input  partition  method  for  test 
case  generation  and  data-driven  symbolic  execution  for  test 
execution.  Me  only  considered  programs  which  can  be  specified 
using  a  cause/ef f ect  graph.  For  this  kind  of  program*  it  is 
much  easier  to  derive  the  input  partition  from  both  the  program 
specification  and  code.  The  logic  of  this  kind  of  program  is 
usually  straightforward  and  has  no  complex  loop  structures. 
The  cause/ef f ect  graph  manner  of  specification  was  actually 
used  to  specify  complex  real  time  software  systems. 

The  application  of  the  input  partition  method  tends  to 
produce  too  many  test  cases.  The  number  of  partition  classes 
should  be  used  as  a  testability  measure*  and  modularization  of 
the  program  should  be  done  by  taking  this  factor  into 
consideration  in  the  design  stage  of  the  development  phase. 
Although  the  input  partition  method  requires  much  effort  and 
time  for  nontrivial  modules*  it  identifies  all  the  functions  of 
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the  module*  and  in  the  process  of  forming  the  partition*  it 
also  detects  missing  path  errors  CGOOD753.  Furthermore*  it  can 
also  detect  domain  errors.  The  tool  uie  have  developed  can 
select  and  execute  the  necessary  subset  of  the  original  test 
cases  and  it  can  also  execute  all  the  generated  test  cases. 
The  results  of  the  test  selection  and  test  execution  using 
data-driven  symbolic  execution  include  outputs  in  symbolic  and 
real  forms.  The  domain  information  obtained  by  executing  a 
program  can  be  compared  uiith  the  correct  outputs  and  the  domain 
information  stored  in  the  decision  table.  This  increases  the 
chance  of  detecting  both  computational  and  domain  errors.  The 
real  value  output  may  detect  overflow  and  truncation  errors 
which  cannot  be  detected  by  conventional  symbolic  execution. 
We  used  data-driven  symbolic  execution  to  solve  most  of  the 
problems  encountered  in  symbolic  execution.  When  the  existence 
of  errors  is  detected*  our  tool  can  be  used  as  a  debugger*  and 
provide  useful  and  helpful  test  execution  information  for  the 
maintenance  programmer.  Since  module  testing  is  just  a  part  of 
the  overall  program  revalidation  strategy*  we  plan  to  develop 
methods  for  integration  testing  and  system  function  testing. 
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7.0  METRICS  RELATED  TO  SOFTWARE  MAINTENANCE 

Since  the  major  concern  of  our  research  u»ork  is  uiith 
software  maintenance  problems/  we  have  focused  our  attention  on 
mod i f i ab i 1 i tu  related  metrics.  We  have  identified  several 
critical  attributes  that  affect  modifiability/  namely/  logical 
stability/  performance  stability  as  well  as  module  strength  and 
coupling.  These  are  all  important  factors  in  evaluating  the 
modifiability  of  a  program.  Individual  measures  for  each 
attribute  have  been  developed.  There  will  be  a  brief 
description  of  each  of  these  measures  in  the  following 
sections.  Detailed  results  have  appeared  in  CYAU7B/  60a/  B0e/ 
82c ] /  CEJZA621.  A  limited  validation  experiment  for  our  logical 
stability  measure  has  also  been  conducted.  The  results  will 
also  be  presented  in  the  following  sections.  The  integration 
of  these  attributes  into  a  modifiability  metric  requires  more 
study . 


Due  to  the  experience  gained  from  the  implementation  of 
our  logical  stability  measure/  we  feel  that  we  need  a  more 
efficient  way  of  analyzing  logical  ripple  effects  for  large 
scale  programs  with  less  requirement  on  accuracy.  In  this 
section/  we  will  also  present  some  preliminary  results  on  these 


prob 1 ems . 
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7 . 1  Log  i  c  a  1  St  at)  i  I  i  tu  tie  asure 

The  stab i 1 i tu  CYftU80e3  of  a  program  is  the  resistance  to 
the  potential  ripple  effect  that  the  program  would  have  when  it 
is  modified.  The  stab i 1 itu  of  a  modu 1 e  is  the  resistance  to 
potential  ripple  effect  of  a  modification  of  the  module  on 
other  modules  in  the  program.  Since  ripple  effect  is  one  of  the 
major  reasons  for  introducing  errors  in  the  software 
maintenance  process/  the  stability  of  a  program  or  modu  is 
closely  related  to  its  modifiability. 

7.1.1  Logical  st ab i 1 i tu  measure  for  modu 1 es 

ft  measure  for  the  logical  stabilitu  of  a  module  k/  denoted 

by  LSkz  is  defined  CYftU80el  as  follows: 

LS  =  l-'LRE 
k  k 

where  LREk  =  the  logical  ripple  effect  measure  of  a  primitive 
type  of  modification  to  a  module  k/  where  a 
primitive  type  of  modification  is  considered  as  a 
modification  of  a  variable  definition  of  module 
k . 

a  Z  CP<  k i )  LCM  3 
.  .1  k  i 

1<U* 

v,  s  the  set  of  all  variable  definitions  in  module  k/ 
k 

P(ki)  =  the  probability  that  a  particular  variable 
definition  of  module  k  will  be  selected  for 
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LCM. 


k  i 


k  i 


mod  i  f icat ion/ 

=  tn>:  logical  complexity  of  each  modification  to 
variable  definition  i  in  module  k 
=  2  C 


t€  W 


k  i 


=  the  complexity  measure  of  module  t 
=  the  modules  involved  in  the  intermodule  change 
propagation  as  a  consequence  of  modifying 
variable  definition  i  of  module  k 
=  U  X. 


i€Z 


kj 


k  i 


Z.  .  =  the  set  of  interface  variables  tuhich  are  affected 

k  i 

by  logical  ripple  effect  as  a  consequence  of 
modification  to  variable  i  in  module  k 
x  =  the  set  of  modules  involved  in  intermodule  change 

*  J 

propagation  as  a  consequence  of  affecting 
interface  variable  j  of  module  k. 


Logical  stability  measure  may  be  normal i zed  to  have  a 
range  of  0  to  1  with  1  as  the  optimal  logical  stability.  This 
normalized  logical  stability  can  be  utilized  qualitatively  or 
it  can  be  correlated  with  collected  data  to  provide  a 
quantitative  measure  of  stability.  The  normalized  logical 

41 

stability  measure  for  module  k,  denoted  by  1*5^/  is  defined  as 


f o 1  lows: 
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1  -  LRE* 
k 


where  LRE* 


the  normalized  logical  ripple  effect  measure  for 
module  k 

lreVc 

*  P 


Cp  s  the  total  complexity  of  the  program  which  is  equal 
to  the  sum  of  all  the  module  complexities  in  the 
program* 


LRE*  =  the  modified  logical  ripple  effect  measure  for 
module  k 

=  C  ♦  2  CP(ki)  LCM  3 

Cfc  s  the  complexity  of  module  k. 


7.1.2  Logical  sttbi 1 itu  measure  for  programs 


A  measure  for  the  logical  st  ab  i  1  i  tu  of  a  program/  denoted 
py  LSP #  is  defined  C YAUBBe 3  as  follows: 


LSP  =  l/'LREP 


where  LREP  =  the  measure  for  the  potential  logical  ripple 
effect  of  a  primitive  modification  to  a  program 
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=  l  CP  ( k )  LRE.  l 
k  =  1  k 


P(k)  =  the  probability  that  a  modification  to  module  k 
may  occur 

LREfe  =  the  logical  ripple  effect  measure  of  a  primitive 
type  of  modification  to  a  module  k 
n  =  the  number  of  modules  in  the  program. 


The  normal i zed  logical  stability  measure  for  a  program* 

* 

denoted  by  LSP  #  is  defined  as  follows: 

*  * 

LSP  =  1  -  LREP 

1|| 

where  LREP  =  the  normalized  logical  ripple  effect  measure  for 

the  program 


=  2  CP<  k  )  LRE  3 
k  =  l 


P(k)  =  the  probability  that  a  modification  to  module  k 
may  occur 

LRE*  =  the  normalized  logical  ripple  effect  measure 
for  module  k. 


7.2  Performance  Stab  i  1 1  tu  fie  asure 


is  defined  as  follows: 
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PS^  =  1/PREM, 
k  k 


inhere  prem^  is  the  performance  ripple  effect  measure  of  a 
primitive  type  of  modification  to  a  module  k  and  defined  as 


PREM  *  1  CP(ki)  PREB  3 
i  €U 

k 


P<ki)  is  the  probability  that  variable  definition  i  of  module  k 
will  be  modified*  is  the  the  set  of  all  variable  definitions 
in  module  k*  and  PREB^  is  the  performance  ripple  effect  of 
modifying  a  block  i  in  module  k*  which  is  defined  as 


PREBfc .  =  The  number  of  performance  requirements  affected 
by  modifying  variable  i  of  module  k. 


The  performance  stab i 1 itu  of  a  program*  denoted  by  PSP*  is 
defined  as  follows: 


PSP  =  1/PREP 


where  PREP  is  the  performance  ripple  effect  measure  of  a 
primitive  type  of  modification  to  the  program  and  is  defined  as 


PREP  =  1  CPC  k)  PREM  3* 
k  =  l 


P(k)  is  the  probability  that  module  k  will  be  modified*  and  n 


*  / 
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is  the  number  of  modules  in  the  program. 

I 

7.3  Des ign  stab i 1 itu  Measure 

i 

It  would  be  more  valuable  if  we  can  apply  the  stability 
measure  at  early  stages  of  program  development.  Therefore#  we 
j  have  developed  a  stability  measure  that  may  be  applied  during 

the  design  phase.  The  design  stabilitu  of  a  program#  denoted 
by  POS#  is  defined  C YAU82c 3  as  follows: 


PDS  *  lVCZ  DLR£x)# 

X 

and  the  design  stability  for  each  module  x 

I 


DS  =  1/DLRE 
X  x 


if  DLREx»«  0#  or 


DS  s  1 
X 


if  dlrex  =  0#  where 


DLREx=  the  design  logical  ripple  effect  measure  for 
module  x 

=  TG  +  l  TP  +  Z  TP ' 

*  y«Jx  xu  y«J'x  xu 


>  •  . 

. Nv.-v 


I  "-'il 


tne  total  number  of  assumptions  made  by  other 

modules  about  the  global  data  items  in  GO  . 

x 

the  total  number  of  assumptions  made  by  y  about 
the  parameters  in  R 

xy 

the  total  number  of  assumptions  made  by  y  about 
the  parameters  in  R'  . 

the  set  of  global  data  defined  in  module  x. 

the  set  of  passed  parameters  returned  from 

module  x  to  module  y>  inhere  yeJ^. 

the  set  of  parameters  passed  from  module  x  to 

module  y»  inhere  y«J'  . 

x 

the  set  of  modules  mhich  invoke  module  x. 
the  set  of  modules  invoked  by  module  x. 
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7.4.  Modu 1 e  Strength  and  Coup  1 ino  Metr ics 

Me  have  developed  the  definitions  of  metrics  for  module 
strength  and  coupling  at  the  code  level*  which  are  presented  in 
detail  in  CCJZP023.  These  metric  definitions  are 
approx imat ions  of  the  heuristic  definitions  of  module  strength 
and  coupling  as  found  in  the  literature  on  Structured  Design 
CMYER783*  and  are  based  on  a  new  technique  for  estimating  the 
probabilities  of  data  object  interactions.  These  metrics  are 
designed  to  help  estimate  those  qualities  of  software  structure 
which  affect  the  amount  of  effort  required  during  the  program 
maintenance  activities  of  functional  extension  and  large-scale 
modification. 

nodule  strength  and  coupling  appear  to  be  significant 
attributes  affecting  the  modifiability  and  reusability  of 
computer  programs  and  should  be  important  elements  of  future 
metrics  for  modifiability  and  reusability.  Metrics  for  these 
important  structural  attributes  should  also  improve  the 
visibility  of  software  structure  and  provide  an  objective  means 
for  program  managers  to  evaluate  individual  pieces  of  software 
or  to  choose  between  alternate  solutions  to  the  same  problem. 

A  software  tool  for  computation  of  the  module  strength  and 
coupling  metrics  has  been  designed  for  the  PASCAL  language  on 
our  DEC  UAXil/780  computer.  Implementation  of  this  tool  is 


nearly  complete.  Even  though  the  tool  is  designed  for  the 
PASCAL  language/  our  technique  is  applicable  to  any 
block-structured  programming  language.  Validation  and 
refinement  of  the  module  strength  and  coupling  metrics  should 
be  performed  by  correlating  them  to  their  structured  design 
heuristics  in  experiments. 

In  the  following  sections  we  will  briefly  describe  our 
metrics  for  strength  and  coupling.  The  metric  algorithms  are 
based  on  a  simple  program  graph  model  and  estimates  of  the 
probabilities  of  data  object  interactions.  This  model 
characterizes  those  program  attributes  most  relevant  to  the 
metr ics . 


7.4.1  Estimating  Data  Ob  iect  Inter  act  ion 

Estimates  of  the  probabilities  of  data  object  interactions 
are  based  on  a  structural  distance  function*  which  assigns  an 
integer  value  (greater  than  zero)  to  each  pair  of  points  in 
program  source  text  (for  one  procedure  or  function)  where 
definitions  or  references  to  data  objects  may  occur.  This 
function  is  a  count  of  the  number  of  syntactic  levels 
(associated  with  statements)  in  the  shortest  (syntactic)  path 
from  one  point  to  the  other.  If  there  is  a  data  flow  path  from 
one  definition  or  reference  to  another*  the  probability  of 
interaction  is  assumed  to  vary  inversely  with  structural 
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distance.  Actual  probabilities  associated  with  an  average 
execution  path  through  a  program  are  inaccessible  to  a  static 
analysis  too  1 . 


The  structural  distance  function  is  used  to  estimate  the 
probabilities  of  interaction  between  any  two  data  object 
definitions  inside  a  procedure  or  function  in  the  following 
four  steps: 


1)  A  graph  model  is  created  for  a  procedure  or  function 
which  consists  of  edges:  a)  from  nodes  where  data  objects  are 
defined  to  nodes  where  they  may  be  referenced#  and  b)  from 
nodes  where  data  objects  are  referenced  to  nodes  where  data 
object  definitions  may  be  affected.  A  distance  value  is 
assigned  to  each  edge  in  the  graph#  which  corresponds  to  an 
inverse  probability  of  interaction.  The  distance  values  are 
based  on  the  structural  distance  function  described  above. 


2)  The  graph  model  of  Step  1  is  simplified  to  indicate 
only  direct  distance  values  between  data  object  definitions. 
Data  object  definition  nodes  correspond  to  the  most  important 
events  in  an  execution  path  of  a  program.  Each  edge  in  this 
simplified  graph  corresponds  to  a  pair  of  consecutive  edges 
between  two  data  object  definition  nodes  via  a  reference  node. 
The  distance  value  assigned  to  each  new  edge  is  the  sum  of  the 
distances  associated  with  the  two  edges  from  the  original 
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graph . 

3)  The  transitive  closure  (or  shortest-path )  of  the  matrix 
of  distance  values  associated  uiith  the  graph  from  Step  2  gives 
the  Data  Definition  Distance  Matrix  (DDDM)  for  a  procedure. 
The  closure  process  finds  all  direct  or  indirect  interactions 
between  data  object  definition  nodes. 

4)  The  DDDM  (from  Step  3)  for  a  procedure  is  simplified  so 
that  it  may  be  included  in  the  computation  of  the  DDDM's  for 
its  calling  procedures.  Simplification  is  done  by  removing  all 
nodes  associated  with  local  variables#  and  by  summarizing  all 
data  interactions  for  each  parameter  and  global  which  is 
referenced  or  defined  in  the  procedure  with  a  single  input 
reference  and  a  single  output  definition.  Steps  1-4  are 
repeated  until  the  DOOM'S  are  constructed  for  all  procedures. 

Note  that  the  above  steps  must  be  applied  to  procedures 
and  functions  in  a  specific  order  so  that  information  is 
available  for  a  procedure  when  it  is  referenced  in  one  of  its 
calling  procedures.  Any  forward  referenced  procedures  in  the 
source  text  require  special  iterative  processing. 
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7.4.2  Def in  it  ion  of  Intra-Module  Strength  Metric 

Me  consider  a  module  here  as  any  invocable  procedure  or 
function,  and  define  its  strength  in  the  context  of  Structured 
Design,  as  the  level  of  interdependence  between  its 
subcomponents.  Me  use  this  definition  to  construct  a  strength 
metric  for  a  procedure  or  function  from  its  DDDM. 

Each  element  of  a  DDDM  is  interpreted  as  an  inverse 
probability  of  one  data  definition  node  affecting  another. 
Since  uie  consider  the  data  definition  nodes  to  be  the  most 
significant  nodes  in  the  procedure  (in  our  view  of  a  procedure 
as  a  means  to  alter  program  data),  these  nodes  are  associated 
with  the  'module  subcomponents'  of  the  strength  definition. 
The  'level  of  interdependence'  between  subcomponents  is 
interpreted  as  the  average  probability  of  interaction  between 
distinct  pairs  of  nodes. 

Our  strength  metr ic  for  a  procedure  A  is  denoted  by  SM<A>, 
and  is  defined  as  the  average  over  the  reciprocals  of  the 
elements  in  the  upper  triangle  minus  the  main  diagonal  of  the 
matrix  which  is  the  minimum  of  the  DDDMCA>  and  its  transpose. 

Two  simple  examples  illustrate  the  range  of  values 
attainable  using  SM  on  procedures  of  clearly  different 
strength:  1 )  SM  =  1  for  a  procedure  which  initializes  the 
value  of  its  single  output  parameter,  and  2)  SM  =  0.2  for  a 


procedure  which  independently  initializes  3  output  parameters. 


Coupling  between  two  modules  in  a  software  system  is 
defined/  in  the  context  of  Structured  Design/  as  the  level  of 
direct  data  object  interaction  between  two  modules.  Me  use 
this  definition/  along  with  discussions  in  the  literature  about 
the  way  in  which  different  situations  affect  the  perception  of 
data  coupling/  to  construct  a  coupling  metric  for  any  two 
modules  in  a  software  system. 


There  is  some  form  of  data  object  interaction  between 
virtually  every  pair  of  modules  in  a  system.  Me  examine  only 
direct  coupling  between  procedures  and  functions/  since  the 
lowest  level  of  coupling  is  described  as  having  no  direct 
coupling.  Direct  coupling  only  occurs  between  a  procedure  and 
its  immediate  subordinates  (those  procedures  which  it  may  call 
directly)/  and  between  any  other  procedures  which  share  global 
data.  Mhen  the  DDDM  for  a  procedure  A  is  computed/  all 
parameter  coupling  information  to  its  subordinates  is 
available/  as  well  as  all  global  coupling  information 
associated  with  the  global  data  declared  within  procedure  A. 
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An  Inter-nodule  Data  Object  Coupling  (IMDOC)  value/  which 
is  associated  with  each  'edge'  of  direct  data  flout  between  two 
procedures  A  and  B  will  be  defined  later.  One  of  two  different 
mechanisms/  parameter  coupling  or  global  coupling/  mag  support 
each  'edge'  of  data  flow: 

Coup  1  i no  (C)  between  procedures  A  and  B  is  defined  as 
f o 1  lows : 

C ( A / B )  =  Sum  of  IMDOC(e)  over  each  edge  (e)  of  direct  data  flow 
between  A  and  B. 


The  IMDOC  value  associated  with  each  edge  (e)  of  direct 
data  flow  between  any  two  procedures  or  functions  is  defined  as 
f o 1  lows : 


DOCCe)  PR(e>  DR(e ) 

IMDOC(e)  r  - 

DDDM(e)  AIP(e) 


where  DOC  =  Data  Object  Complexity/  such  as  an  array  is  more 
complex  than  a  simple  integer. 

PR  =  Parameter  Rating/  such  as  a  global  variable  has  a 
higher  value  than  a  parameter. 
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DR  =  Data  Rating/  such  as  Chapin-type  rating  for 
'through'*  'data'  or  'control'  objects  [ CHAP79] . 


RIP  =  Average  interaction  Probability  to  other  elements 
involved  in  the  coupling. 


Now*  let  us  discuss  the  last  four  functions  DOC*  PR*  DR 
and  AIP:  An  increase  in  data  object  complexity  (DOC)  increases 
the  coupling  associated  with  an  'edge'  of  direct  data  flow 
between  procedures.  This  feature  is  incorporated  in  the 
coupling  measure  in  order  to  take  into  account  of  the  effects 
of  stamp  coupling  and  in  recognition  of  the  fact  that  a  more 
complex  data  object  has  the  potential  to  pass  more 
'information'.  The  DOC  function  is  defined  recursively 
according  to  the  structure  of  the  data  object. 

There  is  significant  experimental  evidence  to  show  that 
direct  global  coupling  renders  programs  more  difficult  to 
modify  than  direct  parameter  coupling  CDUN5803.  The  parameter 
rating  (PR)  function  has  value  2  for  global  edges  and  value  1 
for  parameter  edges. 


'Control'  objects  are  understood  to  contribute  more 
inter -procedural  coupling  than  'data'  objects.  'Through' 
objects*  which  are  not  directly  defined  or  referenced  by  one 
(or  both)  of  the  procedures  involved*  but  passed  through  for 
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use  elsewhere*  contribute  less  coupling  than  'data'  objects. 
Chapin  CCHAP79]  defined  and  discussed  'control'/  'data'  and 
'through'  objects  and  suggested  heuristics  for  identifying  each 
type.  An  automatable  means  of  roughly  identifying  objects  in 
this  fashion  has  been  defined/  and  a  data  rating  (DR)  value 
assigned  to  each  type  in  order  to  take  into  account  differences 
in  their  contribution  to  coupling.  The  data  rating  function 
has  value  2  for  'control'  objects/  1/2  for  'through'  objects 
and  1  for  'data'  objects. 

The  grouping  of  data  objects  implied  by  the  DDDM  probably 
affects  coupling  between  procedures.  If  data  objects  interact 
closely/  they  are  probably  related  in  function/  and  contribute 
less  to  coupling  when  they  are  involved  together  in  direct  data 
flow  between  procedures/  as  in  the  concept  of  data  abstraction. 
This  relationship  is  made  explicit  in  our  measure  of  coupling 
with  the  average  interaction  probability  (AIP)  function.  AIP 
is  the  average  of  the  reciprocals  of  the  distances  (interpreted 
as  interaction  probabilities)  between  the  source  node  (v)  of 
the  'edge'  under  consideration  and  all  other  source  nodes  of 
edges  1)  which  contribute  to  data  flow  between  the  same  two 
procedures/  2)  which  are  all  either  parameters  or  globals 
according  to  vz  and  3)  whose  source  nodes  are  located  in  the 


same  procedure  as  v. 
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7 . 5  mi  idat  ion  of  the  logical  st  ab  i  l  i  tu  me asure 

Due  to  budget  constraints*  we  have  performed  only  a 
limited  number  of  experiments  for  validating  the  logical 
stability  measure  at  the  procedure  level.  The  goal  of  this 
validation  is  to  shorn  that  there  is  indeed  a  certain 
correlation  between  the  proposed  normalized  logical  stability 
measure  computed  for  each  procedure  and  the  reciprocal  of  the 
average  number  of  code  changes  needed  to  keep  the  program 
consistent  and  correct  caused  by  a  primitive  change  in  that 
procedure.  Therefore*  an  experiment  uias  devised  to  quantify  the 
average  number  of  code  changes  needed  for  handling  the  ripple 
effect  caused  by  actual  modifications  for  procedures  in  a 
program.  Then*  the  results  were  compared  with  the  measures 
applied  to  the  program. 


7.5.1  Exper i mental  Procedures 

Me  will  now  describe  in  detail  the  experiments  used  to 
conduct  the  validation*  how  programs  were  selected*  how 
modification  proposals  were  generated  for  these  programs*  how 
the  modifications  were  quantified*  how  the  logical  ripple 
effect  of  each  modification  was  measured*  and  the  statistical 


analysis  used  to  determine  the  correlation  figures. 
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7. 5. 1 . 1 


Se  l ect  ion 


A  set  of  programs  mas  prepared  for  the  experiments.  These 
programs  mere  restricted  to  PASCAL  programs  because  the  data 
flom  analysis  tools  me  have  developed  is  for  PASCAL  programs/ 
although  the  techniques  are  applicable  to  other  programming 
languages.  Me  also  limited  the  use  of  pointer  typed  data  in  the 
programs  because/  using  existing  data  flom  analysis  tools/  it 
mould  produce  imprecise  data  flom  information  mhich  mill  affect 
the  measures  generated  by  the  experiments.  The  length  of  each 
of  these  programs  mas  around  1200  lines  of  code  and  each 
program  contained  more  than  20  procedures.  More  detailed 
information  about  the  programs  actually  selected  mill  be  given 
in  Section  7.5.2. 


7 . 5. 1 . 2  Mod  if icat ion  Propos a  1  Gener  at i on 


Specifications  for  each  procedure  mere  generated  from  the 
program  code  and  considered  for  possible  modification.  Many 
realistic  and  feasible  modification  proposals  to  these 
specifications  mere  generated  and  evaluated  for  each  procedure. 
In  this  process/  me  chose  those  specifications  mhich  mere 
'local'  to  a  particular  procedure  as  the  modification  target. 
Since  it  mas  not  almays  possible  for  all  procedures  to  have 
meaningful  local  specifications  to  be  modified/  only  those 
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procedures  mhicn  can  satisfy  this  requirement  mere  selected  so 
that  all  (or  most)  of  the  primary  modifications  mould  be  mithin 
that  procedure. 


7.5. 1.3  Quant  if icat ion  of  the  Real i zed  Mod i f i c at i ons 


When  the  modification  proposals  mere  carried  out  at  the 
code  level#  three  persons  mere  involved  in  this  process.  The 
first  person  mas  the  author  of  the  target  program.  All  the 
modifications  mere  performed  by  the  second  person#  and  then 
mere  checked  for  correctness  and  optimality  by  the  author  of 
the  program  and  the  third  person.  We  need  to  restrict  the 
length  of  the  programs  used  in  the  experiments  because  me  mant 
to  be  sure  that  every  modification  could  be  correctly  handled 
by  one  person.  The  number  of  code  changes  needed  for  each 
modification  mas  quantified  as  the  minimum  number  of  'tokens' 
that  had  to  be  deleted  from  or  added  to  the  original  program  in 
order  to  implement  a  particular  modification  proposal. 


7.5. 1.4  Actual  Ripple  Effect  Est imat ion  and  Normal izat ion 


(1)  Distinction  betmeen  primary  modifications  and  the 
modifications  caused  by  the  ripple  effect  :  All  modifications 
made  to  the  procedure#  mhere  the  specifications  to  be  modified 
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mere  generated#  mere  viewed  as  primary  modifications.  All  other 
necessary  modifications  outside  the  procedure  uiere  considered 
as  a  result  of  logical  ripple  effect. 

(2)  Normalization  :  For  the  i-th  modification  proposal  in 
procedure  M#  the  above  tmo  types  of  actual  resulting 
modifications  mere  both  quantified  according  to  the  token-count 
method.  Let  the  minimum  number  of  tokens  involved  in  the 
primary  modification  be  P.#  and  the  minimum  number  of  tokens 
involved  in  modification  correspond ing  to  the  other  type  be  R_. 
Then  me  use  N.=  (R.+Pl^P  as  the  average  number  of  token 

till 

changes  caused  by  one  primitive  change  (to  a  token)  in  the  code 
level  of  the  i-th  modification  proposal.  suppose  me  have  n 
modification  proposals  in  procedure  M.  Since  they  may  vary 
in  the  difficulty  or  efforts  involved  in  making  the 
me  use  their  average  to  estimate  the  stability  of  the 
Therefore#  the  estimated  normalized  stability  measure 
procedure  II  is  calculated  by 

*  n 

LS  =  1  N  >✓  n3 

M  i  =  l  4 

This  value  has  a  range  from  0  to  l»  mith  l  as  the  optimal 
stability  which  is  exactly  the  same  as  the  proposed  stability 
measure.  This  result  mill  then  be  used  to  correlate  mith  the 
normalized  stability  measure  calculated  from  the  original 


greatly 

change# 

modu 1 e . 

LS*  for 
n 


program  code. 
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7.5. 1.5  Statistical  Methods  used  in  ftnalus is  of  the  Resu  its 

Me  used  Pearson  product-moment  correlation  (r)  to  analyze 
our  experimental  results  CBRUN68J.  The  basic  computation 
formula  for  the  product-moment  correlation  is 

r  *  CN<2XY)-<2X)  (2Y)3^(CN(2X2)-(ZX)23CNC2Y2)-CTV)23)1''2 

where  N  =  the  number  of  scores  for  the  pairs  (x*y) 

2XY  3  tne  sum  of  the  products  of  the  paired  scores 

The  Pearson  product-moment  correlation  has  been  widely 

used  to  determine  if  there  is  a  relationship  between  two  sets 

of  paired  numbers.  The  significance  of  the  resulting 

correlation  may  be  further  tested.  Two  different  procedures 

have  been  used  to  test  the  hypothesis  that  r=0  C BRUN683 .  if  the 

sample  size  N  is  30  or  larger*  a  critical-ratio  z-test  can 

1/2 

easily  be  done.  In  this  case*  z  =  r(N-l)  is  calculated  as  an 

index  to  find  the  significance  of  the  correlation.  if  the 

sample  size  N  is  less  than  30*  a  slightly  more  complicated 

t-test  should  be  done.  In  this  case*  the  degree  of  freedom  df* 

2  1V2 

and  the  index  t  =  rt  (H-2)/'(  1-r  )3  are  calculated  to 


determine  the  significance  of  the  correlation. 
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7.5.2  Analus is  of  the  Resu its 

Six  programs  have  been  examined  in  the  experimental 
process  of  the  logical  stability  measure  validation.  The 
average  length  of  the  programs  used  is  around  1200  lines  of 
PASCAL  code.  Each  program  has  between  20  and  47  procedures.  The 
logical  stability  measure  for  each  procedure  in  these  programs 
is  listed  in  Table  7.1. 

Thirty  modification  proposals  have  been  generated  and 
applied  to  28  procedures.  The  procedures  marked  with  in 
Table  7.1  are  those  which  were  selected  for  experimentation. 
Table  7.2  shows  the  correlation  of  logical  stability  measure 
versus  the  experimental  result  for  each  modification  proposal 
on  those  28  procedures,  in  order  to  show  tnat  our  sampling  was 
representat ive,  the  means  and  standard  deviations  of  the 
logical  stability  measures  for  modules  selected  in  each  program 
have  been  calculated.  As  shown  in  Table  7.2*  they  are  quite 
close  to  the  means  and  standard  deviations  calculated  from  the 
logical  stability  measures  of  all  the  modules  in  individual 
progr  ams . 


The  individual  correlation  and  the  probability  that  the 
hypothesis  of  the  actual  correlation  being  zero  is  true  are  both 
significant  and  are  listed  in  Table  7.2.  The  overall 


corre 1  at  ion 


coef f ic lent 


calculated  from  all  the  results 
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(estimated  logical  stability  measure  based  on  our  experiment) 
shown  in  Table  7.3  against  our  computed  logical  stability 
measure  is  0.6338.  The  probability  of  the  actual  correlation 
being  zero  is  less  than  0.13.  These  facts  indicate  that  there 
is  indeed  a  correlation  between  our  computed  logical  stability 
measure  and  the  experimental  results. 


7.5.3  Discuss  ion 

The  main  purpose  of  this  experiment  is  to  show  that  there 
is  a  significant  correlation  between  the  proposed  normalized 
stability  measure  computed  for  each  procedure  and  the 
reciprocal  of  the  average  number  of  code  changes  needed  to  keep 
the  program  consistent  and  correct  after  a  primitive  change  has 
been  made  to  that  procedure. 

Although  the  result  is  positive*  refinement  of  the 
experimental  process  should  be  implemented  provided  that  a 
better  environment  and  better  tools  exist. 

l)  The  number  of  code  changes  is  currently  calculated  by  the 
changes  of  'tokens'.  When  a  statement  includes  a  procedure 
or  function  call*  it  should  have  a  suitable  weight  to 
reflect  the  code  changes  implied  by  the  call. 
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Table  7.1  Logical  stability  measures  for  each  module  of  the 
target  programs  used  in  the  experiment. 


Program  1 : 

(A  pretty  printer  for  PASCAL  program  stored  in  parse-tree 
form  :  1735  lines) 


nodulname  Complexity  l.R.e.  Factor  Logical  stability  measure 


1 

PROGRAM 

13 

31 . 94783 

0.6091493368 

* 

2 

GETCHAR 

8 

97.14286 

0 . 0857142806 

3 

STORENEXTC 

2 

94.23077 

0. 1632107496 

4 

SKIPSPACES 

4 

93.26087 

0. 1542533040 

* 

5 

GETCOMMENT 

3 

60. 19512 

0.4504771829 

* 

6 

IDTYPE 

8 

65.00000 

0.3652173877 

7 

GETIDENTIF 

7 

67.69566 

0.3504725695 

* 

8 

GETNUMBER 

2 

56.26316 

0.4933638573 

9 

GETCHARLIT 

4 

52.05263 

0.5125858188 

10 

CHARTYPE 

7 

82.00000 

0.2260869741 

11 

GETSPECI AL 

2 

58.65116 

0.4725986123 

12 

GETNEXTSYM 

7 

52.20000 

0.4852173924 

13 

GETSYMBOL 

2 

59.37879 

0.4662714005 

14 

INITIALIZE 

1 

90. 12000 

0.2076521516 

15 

STACKEMPTY 

2 

75.00000 

0.3304347992 

16 

STACKFULL 

2 

7.00000 

0.9217391610 

17 

POPSTACK 

2 

72. 17647 

0.3549872637 

18 

PUSHSTACK 

1 

73.00000 

0.3565217257 

19 

WRITECRS 

3 

38. 12500 

0.6423913240 

20 

INSERTCR 

2 

71.00000 

0.3652173877 

* 

21 

INSERTBLAN 

5 

37.09091 

0.6339921355 

22 

LSHIFTON 

5 

52.30769 

0.5016722679 

23 

LSHIFT 

2 

60.85714 

0.4534161687 

24 

INSERTSPAC 

3 

65.92857 

0.4006211162 

25 

MOOELINEPO 

2 

24.50000 

0.7695652246 

* 

26 

PRINTSYMBO 

2 

43.53846 

0.6040133834 

27 

PPSYMBOL 

5 

62.72549 

0.4110826850 

28 

RSHIFTTOCL 

2 

42.60000 

0.6121739149 

29 

GOBBLE 

2 

45.94118 

0.5831202269 

30 

RSHIFT 

5 

57.12000 

0.4598261118 

** 

Summary  : 

115 

6.94572 

0.4481014609 
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(Table  7.1  -  Continued) 


Program  2  : 

(A  pretty  printer  for  PASCAL  program  stored  in  parse-tree 
form  :  1115  lines) 


Module  name  Complexity  L . R . 


1 

PROGRAM 

3 

2 

PFIOLD 

0 

3 

PF1READ 

0 

4 

PF1CLOSE 

0 

5 

MOUE 

0 

* 

6 

ADDTOKEN 

2 

7 

SYPARS 

4 

8 

CNSTPT 

3 

* 

9 

CNSLST 

7 

10 

VARTYP 

2 

11 

VARLST 

5 

* 

12 

TYPTYP 

8 

13 

UARBPT 

1 

14 

TYPEPT 

1 

15 

TYPLST 

2 

16 

BKPARS 

1 

17 

EXPRESSION 

21 

18 

EXPLIST 

3 

19 

VARUSAGE 

6 

20 

ACTUALPARM 

4 

21 

FUNCTIONCA 

2 

22 

CONSTUSAGE 

9 

23 

BEDLST 

1 

24 

STMTLST 

10 

25 

STMPARS 

3 

26 

ASLST 

1 

27 

PSLST 

2 

28 

IFLST 

2 

29 

COLST 

5 

30 

WHLST 

1 

31 

RPLST 

1 

32 

FTLST 

2 

33 

ULST 

4 

34 

PARLST 

6 

35 

BCKLST 

7 

.  Factor 

Logical  stability  meas 

56.90909 

0.5355884433 

0.00000 

1.0000000000 

0.00000 

1.0000000000 

0.00000 

1 . 0000000000 

0.00000 

1.0000000000 

0.00000 

0.9844961166 

10.47368 

0.8878008723 

9.81818 

0.9006342292 

5.35897 

0.9041939974 

7.80000 

0.9240310192 

10. 18868 

0.8822582960 

4.55844 

0.9026477337 

10.71429 

0.9091916084 

10.62500 

0.9098837376 

12.27273 

0.8893586993 

35.71429 

0.7153931260 

8.59140 

0.7706093192 

44.68750 

0.9303294897 

27.35294 

0.7414500713 

28. 17241 

0.7506014705 

17.38769 

0.8503279686 

0.61818 

0.9254404306 

0.00000 

0.9922480583 

20.82051 

0.7610812783 

49.11111 

0.5960379243 

41.77778 

0.6683893204 

18.07692 

0.8443649411 

47.14286 

0.6190476418 

44.23404 

0.6183407903 

52.88889 

0.5822566748 

33.57143 

0.7320044041 

41 . 68421 

0.6613627076 

5.05882 

0.9297765493 

1.47826 

0.9420289993 

19.76923 

0.7924866080 

mm  summary 


129 


20.65210 


0.8215332031 
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(Table  7.1  -  Continued) 

ill 

q 

Program  3  : 

- 

(A  theorem-pr over  : 

1010  lines) 

»*  V 

,\ '  •- 

nodule  name  Complexity  l.R. 

E.  Factor 

Logical  stability  measure 

M 

B 

l 

PROGRAM 

4 

1 . 60000 

0.9633986950 

S’ 

2 

INITIALIZE 

1 

0.00000 

0.9934640327 

3 

WRITEARGUM 

5 

0.00000 

0.9673202634 

4 

WRITELITER 

2 

3.66667 

0.9629629850 

• 

5 

WRITECLAUS 

1 

1 . 90909 

0.9809863567 

. 

6 

JOINNODE 

3 

136.00000 

0.0915032623 

••  ! 

i 

7 

JOINLITERA 

3 

136.00000 

0.0915032625 

8 

JOINCLAUSE 

3 

147.00000 

0.0196078420 

9 

READINCLAU 

3 

111 . 27273 

0.2531194091 

*  *  .*  * 

„V* 

10 

READINARG 

3 

80. 16129 

0.4433902502 

- 

1 1 

READINLITE 

2 

87. 18182 

0.4171122909 

wV. 

12 

READINSET 

4 

108.62921 

0.2638613582 

Nt 

•  .  ■ 

13 

14 

COPYARGUME 

COPYLITERA 

3 

2 

121.42857 

118.73333 

0. 1867413321 
0.2108932734 

y.y/jgW 

13 

COPYCLAUSE 

2 

122.92308 

0. 1835092902 

•.*  •.*  V  * 

■  V  s  .* 

%v 

16 

COMPAREARG 

7 

128. 11763 

0. 1168781519 

*>'.* 

17 

COMPARELIT 

2 

126 . 85 1 85 

0. 1578310132 

*”  "* ■„*  * 

-  *\ 

18 

COMPARECLA 

4 

102.83714 

0.3015873432 

i 

19 

*  20 

REFUTATION 

DELETELITE 

5 

6 

113.33334 

128.41379 

0.2265794873 

0. 1214784980 

”  :  -  .  -  .  < 

21 

CHECKDUPL I 

7 

118.80000 

0.1777777672 

22 

RESOLUE 

3 

103.78947 

0.3020295501 

•  V-V-V* 

c  ■ 

23 

INITIALIZE 

2 

139.00000 

0.0784313679 

!’  . 

24 

separateua 

3 

136.00000 

0.0915032625 

n 

23 

26 

RESTOREUAR 

ULTUAL 

3 

4 

136.00000 

135.00000 

0.0915032625 

0.0915032625 

27 

COLLECT 

8 

120.08334 

0. 1628539562 

\V.  . 

*, 

28 

COLLECTF 

9 

100.26144 

0.2858729362 

29 

STARTCOLLE 

2 

97.85714 

0.3473389745 

-V- 

30 

MATCH 

10 

122.49580 

0. 1340143681 

0 

31 

UNIFYKEYLI 

3 

117.92000 

0. 1966013312 

32 

APPLYSUBST 

8 

122. 14865 

0. 1493552327 

-v.  .  /  -  .  - 

.  \ 

33 

FORMRESOLU 

3 

109.29338 

0.2529831529 

34 

FINDRESOLU 

5 

93.80000 

0.3542483449 

*.\\\\\\ 

35 

SCANRESOLU 

8 

64.76336 

0.5244225264 

■"  .- 

36 

GENERATE 

4 

26.43478 

0.8010798693 

f 

** 

Summary  : 

153 

6.37374 

0.3332012892 

'  •! 

I:'-: 

[;•:-; 

*  ,- 

1*'  •' 

>• 

► ,  • 

i1;; 

-  i 

*  ■ 

fe 

- 

1'  . 

.'•V 

•< 

.-.v. 

k  • 

.*  -  '  *  ‘  •  ‘  •**  .  /* 

•  -v.-.  - 

-■  ■ 
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(Table  7.1  -  Continued) 

Program  4: 

(A  time  snaring  operating  system  simulator  :  1744 


Module  name  Complexity 


1 

PROGRAM 

8 

2 

MTHRAMDOM 

0 

3 

CURSOR 

1 

4 

JANRESET 

6 

* 

5 

REFRESH 

26 

* 

6 

DISPLAY 

4 

7 

FINDLOC 

3 

8 

ERRORCARD 

3 

9 

BINARY 

14 

10 

FINDDATA 

5 

11 

CHECK 

4 

12 

JOBR 

11 

13 

BATCHR 

8 

14 

READCARD 

6 

* 

15 

CHECKMAIN 

3 

16 

GENMEM 

4 

17 

GETSID 

2 

18 

GETRID 

2 

19 

GETTIME 

1 

* 

20 

ADDSECOND 

1 

21 

REMOUESECO 

1 

22 

ALLOCATECP 

3 

23 

RELEASECPU 

1 

41 

24 

ADDRJQ 

1 

25 

ALLOCATEMA 

2 

26 

REMOUERJQ 

1 

27 

RELEASEMAI 

2 

28 

ALLOCATE 

6 

29 

ALLOCATEIO 

2 

30 

RELEASEIOR 

1 

31 

WALLOCATEI 

2 

32 

RELEASEIOM 

1 

41 

33 

AODIORQ 

1 

34 

REMOUEIOQ 

3 

35 

ADDIOMQ 

1 

4t 

36 

IOINTR 

6 

37 

IOCHECK 

10 

38 

CONVERT 

17 

39 

CONUERTREA 

12 

40 

PUTC 

2 

41 

CLEANRBUF 

2 

42 

PRT 

2 

43 

CSREPORT 

35 

44 

TERMREPORT 

13 

45 

BATCHREPOR 

19 

41 

46 

GENCOND 

7 

47 

UPDATE 

14 

L.R.E.  Factor  Logical 

115.75000 
0.00000 
0.00000 
204.03847 
0.31923 
0.54412 
230.00000 
55.60000 
217.01507 
264.00000 
264.00000 
218.52554 
147.95062 
245. 11111 
138. 18182 
229.00000 
231.00000 
231.00000 
232.00000 
212.89999 
200. 16667 
196.81250 
164.00000 
215.05263 
198.54839 
197.27272 
202.89473 
63.36364 

203.39999 

149.39999 

203.39999 

149.39999 
212.44444 
189.24138 
212.44444 
100.88889 
179.48215 

73.00000 
90.00000 
74.09091 
69.00000 
51 . 00000 
33.45397 
56.29134 
75.75000 
187.01819 
129. 15277 


1 ines ) 

stability  measure 

0.5564516187 
1.0000000000 
0.9964157939 
0.2471739650 
0.9056658149 
0.9837128520 
0. 1648745537 
0.7899641991 
0. 1719861031 
0.0358422995 
0.0394265056 
0. 1773278117 
0.4410371780 
0.0999601483 
0.4939719439 
0. 1648745537 
0.1648745537 
0.1648745537 
0.1648745537 
0.2333333492 
0.2789725065 
0.2838261724 
0.4086021781 
0.2256178260 
0.2811886072 
0.2893450856 
0.2656102777 
0.7513847947 
0.2637993097 
0.4609318972 
0.2637993097 
0.4609318972 
0.2349661589 
0.3109627962 
0.2349661589 
0.6168857217 
0.3208525181 
0.6774193645 
0.6344085932 
0.7272727489 
0.7455197573 
0.8100358248 
0.7546452880 
0.7516439557 
0.6603942513 
0.3045942783 
0.4869076014 


**  Summary  : 


279 


10.52915 


0.4362154007 
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(Table 


Continued) 


Program  5: 

(An  assembler 


823  lines) 


nodule  name  Complexity  L.R.E.  Factor  Logical  stability  measure 


1  PROGRAM 

2  ERROR 

3  PARSER 

4  OCTALNO 

*  5  CHECKMODE 

6  ENTERDECK 

7  ENTERSUBNA 

8  UPDATESUB 

9  ENTERENTRY 
10  SEARCHSYM 

*11  ENTERSYM 

12  PRINTSYM 

13  PRINTDECK 

14  PASSONE 

15  CODEGENATR 

16  NEXTLINE 

17  FINDIDENT 

18  PASSTWO 

*  19  CHECKCODE 

*  20  NEXTCARD 

**  Summary  : 


94.01389 
86.50000 
94 . 67857 
28.00000 
87.11111 
9. 14286 
77.39474 
4.00000 
9. 18182 
97.00000 
99.00000 
1 . 47059 
1.03175 
0.00000 
0.00000 
69.67742 
72.94643 
11.84210 
33.00000 
8.46749 

12.80153 


0.  1591691375 

0.2256637216 

0.0470 922589 

0.7079645991 

0.2114060521 

0.9102401733 

0.2796925902 

0.9557521939 

0.9098954201 

0. 1150442362 

0. 1150442362 

0.9692868590 

0.9554712772 

0.9823008776 

0.7168141603 

0.3391378522 

0.2836599350 

0 . 8686539531 

0.6902654767 

0.7303761840 

0.5586465597 


h  o  -o  v  o  -  > . 
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(Table  7.1  -  continued) 


Program  6: 

(A  time  sharing  operating  system  simulator  :  684  lines) 

Module  name  Complexity  L.R.E.  Factor  Logical  stability  measure 


1  PROGRAM 

2  MTHRANDOM 

3  ENTERIOQ 

*  4  IOREAD 

*  5  IOWRITE 

6  RELEASEMEM 

7  FITMEM 

8  EXCEPTION 

9  TERMINATE 

10  ENTERMEMQ 

11  LEAUEMEMQ 

12  ENTERCPUQ 

*  13  ASSIGN 

14  LEAOECPUQ 

*  15  SEIZECPU 

16  COMPETECPU 

17  LEAUEIOQ 

18  SEIZE 10 
*19  BODY 

20  CLEANUP 

21  BATCHREPOR 

*  22  CARDIN 

**  Summary  : 


57.83133 

0.00000 

60.78571 

1.00000 

1.00000 

40.97561 

57.24299 

13.43000 

23.39024 

83.00000 

66.40000 

84.00000 

0.00000 

60.50000 

81.00000 

73.65306 

83.00000 

79.52728 

24.25000 

66.97222 

3.60784 

84.00000 

8.07312 


0.3038403392 
1 . 0000000000 
0.3563988209 
0.9791666865 
0.9791666865 
0.5210874081 
0.2891354561 
0.6726042032 
0.7042683363 
0. 1145833135 
0.2979166508 
0.1145833135 
0.9583333135 
0.3489583135 
0. 1145833135 
0. 1911139488 
0. 1145833135 
0. 1299242377 
0.6432291865 
0.2502893806 
0.9103349447 
0.1145833135 

0.4594855905 
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Table  7.2.  Correlation  analysis  on  logical  stability 
for  individual  modules. 


Program  l  : 


Module  Number  Logical  Stability  Measure  Experimental  Result 


2 

5 

6 
8 

21 

26 


0.08571 
0 . 45048 
0. 36522 
0.49336 
0.63399 
0 . 60401 


0.43396 
0 . 40000 
0. 33333 
1 . 00000 
1 . 00000 
1 . 00000 


in***  The  correlation  coefficient  is  0.7221  with 
df  =  4,  t  =  2.0878 

P  (the  actual  correlation  being  zero)  <_  10JC 

****  The  mean  of  the  logical  stability  measures  is  0.43880 
mith  standard  deviation  0.18193 

(The  mean  for  all  modules  in  the  program  is  0.44810 
with  standard  deviation  0.17988) 


Note  :  The  notation  P(*)  means  the  probability  that  *  is  true. 


Program  2: 


Module  Number  Logical  Stability  Measure  Experimental  Result 


6 

9 

12 


0.98450 

0.90419 

0.90265 


1 . 00000 
1 . 00000 
1 . 00000 


****  The  correlation  coefficient  is  0.9966  mith 
df  =  1  *  t  =  12.1477 

P  (the  actual  correlation  being  zero)  <_  5)4 

****  The  mean  of  the  logical  stability  measures  is  0.93045 
with  standard  deviation  0.038227 

(The  mean  for  all  modules  in  the  program  is  0.83010 
with  standard  deviation  0.13609) 


Sv- 


Program  3: 

Module  Number  Logical  Stability  Measure  Experimental  Result 

20  0.12148  0 . 19444 


**** 


(The  mean  for  all  modules  in  the  program  is  0.33320 
with  standard  deviation  0.29542) 


nodule  Number  Logi 

5 

6 
15 
20 
24 
33 
36 
46 


al  Stability  Measure 
0.90566 
0.9B371 
0.49397 
0.23333 
0.22562 
0.23497 
0.61688 
0.30459 


Experimental  Result 
0.94186 
0.62162 
1.00000 
0.48971 
0.35210 
0.40749 
1 . 00000 
1.00000 


****  The  correlation  coefficient  is  0.4244  uiith 
df  =  6,  t  =  1.1480 

P  (the  actual  correlation  being  zero)  <_  25* 

»***  The  mean  of  the  logical  stability  measures  is  0.49984 
uiith  standard  deviation  0.28876 

(The  mean  for  all  modules  in  the  program  is  0.43622 
uiith  standard  deviation  0.27158) 


Program  5: 
nodule  Number 
5 
1 1 

19 

20 


Logical  Stability  Measure 
0.21140 
0. 11500 
0.69030 
0.73040 


Experimental  Result 
0.35290 
0.45630 
1 . 00000 
0.54540 


****  The  correlation  coefficient  is  0.6866  uiith 
df  =  2.  X  -  1.3354 

P  (the  actual  correlation  being  zero)  <_  30 * 


****  The  mean  of  the  logical  stability  measures  is  0.43678 
uiith  standard  deviation  0.27605 

(The  mean  for  all  modules  in  the  program  is  0.55865 
uiith  standard  deviation  0.34267) 


Program  6: 

nodule  Number  Logi 

4 

5 
13 
15 
19 
22 


al  Stability  Measure 

0.97917 

0.97917 

0.95833 

0. 11458 

0.64323 

0. 11458 


Experimental  Result 
0.50000 
1 . 00000 
1 . 00000 
0.46591 
0.33229 
0.25000 


*•**  The  correlation  coefficient  is  0.6971  uiith 
df  =  4,  t  =  1.9444 

P  (the  actual  correlation  being  zero)  <_  15* 

The  mean  of  the  logical  stability  measures  is  0.63151 
with  standard  deviation  0.38365 

(The  mean  for  all  modules  in  the  program  is  0.45949 
uiith  standard  deviation  0.32674) 


mmmm 


■1  my  *y  n  »V  ■  u»t  n 

Page  224 

,  ji 

Tab le  7.3. 

The  summary  correlation 
for  all  modules  in  the 

analysis  of  logical  stability 
experiment. 

•N.v 

.y.y  „ 

*  "■_»  ’  ,< 

’  •} 

program  Number  Logical  Stability 

Measure  Experimental  Result 

;V»‘-  »1 

1 

0.0B571 

0.43396 

.y.  /I 

l 

0.45046 

0.40000 

1 

0.36522 

0.33333 

O 

1 

0.49330 

1.00000 

■••j 

.  i 

1 

0.63399 

1.00000 

1 

0.60401 

1.00000 

2 

0.98450 

1.00000 

Si 

2 

0.90419 

1 . 00000 

l 

2 

0.90265 

1.00000 

• —  ■  ■« 

ns  3 

0.12148 

0. 19444 

4 

0.90566 

0.94186 

>; ' 

4 

0.98371 

0.62162 

4 

0.49397 

1 . 00000 

4 

0.23333 

0.48971 

►4  .  / 

4 

0.22562 

0.35210 

%?" 

4 

0.23497 

0.40749 

Sf'-'-T#' 

4 

0.61688 

1.00000 

-*  ,  •' 

4 

0.30459 

1.00000 

5 

0.21140 

0.35290 

-  . 

5 

0. 11500 

0.45630 

.  *  / 

3 

0.69030 

1.00000 

m 

5 

0.73040 

0.54540 

?-• 

6 

0.97917 

0.50000 

-  -  1 

/- 

6 

0.97917 

1.00000 

■V 

6 

0.95833 

1.00000 

* 

6 

0. 11458 

0.46591 

6 

0.64323 

0.33229 

w 

6 

0. 11458 

0.25000 

*• 

* 

**** 

The  correlation  coefficient  is  0.6338  uiith 

df  =  26»  t  =  4.1784 

P  (the  actual  correlation  being  zero)  <  0.12 

■"  *  •" 

"«■  *.*  "  .* 

**** 

The  mean  of  logical  stability  measures  is  0.53859 
uiith  standard  deviation  0.31947 

(The  mean  for  all  modules  in  all  programs  is  0.50671 
uiith  standard  deviation  0.30872) 

V  .* .  . 

\  1‘. 
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2)  Due  to  the  limitations  of  existing  data  float  analysis 
tool*  we  have  to  limit  the  use  of  pointer  typed  data  in 
the  programs  to  avoid  imprecise  data  flow  information.  Me 
hope  to  alleviate  this  constraint  later  on. 

3)  The  experiments  are  developed  on  the  procedure  level 
partly  due  to  budget  constraints.  Experiments  on  the 
program  level  will  be  more  realistic  and  valuable*  but 
will  require  more  manpower  to  perform  the  experiments. 

4)  For  large  scale  programs*  a  more  efficient  tool  is  needed 
to  calculate  the  proposed  measure  of  the  program 
stability. 

7.6.  A  Unified  and  Efficient  Approach  to  Logical  Ripple  Effect 
Ana  1  us  is  Used  i n  Metr ics  Ca leu  1  at  ion 

Logical  ripple  effect  analysis  is  required  in  computing 
the  logical  stability  metric  for  modules  and  programs 
CYAU00e3.  Theoretically*  logical  ripple  effect  analysis  has  to 
be  performed  for  each  variable  occurrence  in  the  program  to 
reveal  the  logical  ripple  effect.  Therefore*  the  efficiency  of 
the  logical  ripple  effect  analysis  technique  becomes  a  prime 
factor  affecting  the  usability  of  the  met'  The  logical 


ripple  effect  analysis  technique  presented  in  Section  3 
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emphasizes  accuracy  in  identifying  logical  ripple  effect  due  to 
given  program  modifications  rather  than  efficiency  in 
identifying  logical  ripple  effect  for  many  program 
modifications  for  statistical  purpose*  and  hence  ts  not 
suitable  for  validating  stability  measure*  especially  for  large 
scale  programs.  Trials  using  "student  projects"  or  small 
demonstration  experiments  are  not  acceptable  representations  of 
the  nature  of  the  dynamics  encountered  in  the  development  of 
large-scale  software  systems-  Therefore*  it  is  desirable  to 
have  an  efficient  way  of  performing  logical  ripple  effect 
analysis. 

Software  quality  metrics  are  more  usable  if  it  can  be 
calculated  in  the  early  stages  of  the  program  life  cycle 
t  KftFUBl 3 .  Strictly  code-based  metrics  provide  only  an 
after-the-fact  evaluation  of  the  quality  of  the  software 
structure.  Such  indications  may  come  too  late  to  correct  any 
structural  deficiencies  in  a  program  that  may  already  have  been 
completely  implemented*  possibly  at  great  cost.  Typically*  it 
is  100  times  more  expensive  to  correct  errors  in  the 
maintenance  phase  on  large  projects  than  in  the  requirements 
phase  C B0EH81 3 .  Therefore*  it  is  also  desirable  to  apply  the 
ripple  effect  analysis  technique  during  the  program  design 
phase*  so  that  data-flow  oriented  predictive  software  quality 
measures  may  be  developed  and  calculated  at  that  time. 
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7.6.1  Foma  1 i zat i on  of  logical  ripple  effect 

Me  utill  discuss  logical  ripple  effects  caused  by  only 
def ine-preserve-use  type  data  flout  propagation  as  illustrated 
in  Fig.  7.1.  That  is*  in  program  execution  phase*  only  those 
ripple  effects  caused  by  using  data  items  which  were  defined 
somewhere  else  previously  and  which  may  be  preserved  up  to  the 
point  where  the  usage  occurs.  This  is  the  common  understanding 
and  cons ider at  ion  of  logical  ripple  effect  in  program 
modification*  and  it  is  by  no  means  a  severe  restriction. 

A  is  used  to  define  B 

<  • 

<  i 

■  • 

•  • 

There  exists  a  control  path 
along  which  B  is  preserved. 

•  t 

•  • 

•  • 
i  i 

\s 

B  is  used  to  define  C 

Fig.  7.1  An  example  illustrating  that  variable  A  may  cause 
potential  logical  ripple  effect  on  variable  C 


Let  PP  be  the  collection  of  all  procedures  in  the 


program.  Let  YU  be  the  set  of  all  variable  names  used  in  the 
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program.  Without  loss  of  generality/  it  is  assumed  that  there 
are  no  distinct  variables  of  the  same  name.  The  scope  of  a 
variable  may  be  viewed  as  an  attribute  of  its  name/  and  the 
terms  "define”  and  "modify"  are  used  i nter change ab ly . 

Now  we  would  like  to  make  the  following  definitions: 
DIRECTMOD  is  defined  as  a  relation  from  PP  to  uu  such  that 
(P/V)  e  DIRECTMOD  implies  that  v  may  be  directly  modified  in  P. 
DIRECTUSE  is  defined  as  a  relation  from  PP  to  uu  such  that 
(P/V)  «  DIRECTUSE  implies  that  v  may  be  directly  used  in  p.  MOD 
is  defined  as  a  relation  from  PP  to  UU  such  that  (P/V)  c  MOD 
implies  that  v  may  be  modified  in  P  or  some  sub  alls  of  P.  USE 
is  defined  as  a  relation  from  PP  to  VU  such  that  (P/V)  c  USE 
implies  that  v  may  be  used  in  P  or  some  subcalls  of  P.  CALL  is 
defined  as  a  relation  in  PP  such  that  (P/Q)  €  CALL  implies  that 
P  may  call  Q  directly. 

Extending  the  usual  definition  of  "use-definition  chains" 
to  make  it  fit  into  inter-procedural  data  flow  analysis/  we 
have  the  following  definitions:  For  each  occurrence  of  variable 
v  in  instruction  i  of  procedure  P  (denoted  by  ip5*  DEFS(v/ip) 
is  defined  as  the  set  of  instructions  which  may  be  the  most 
recent  definitions  for  v  at  run  time.  DIRECTMAPTOp  is  defined 
as  a  relation  in  UU/  where  P  e  PP  such  that 
(U/V)  €  DIRECTMAPTOp  implies  that  (P/U)  €  DIRECTUSE/ 
<P/V)  €  DIRECTMOD  and  v  is  directly  modified  depending  on  the 


Page  229 


value  of  u  in  P.  MAPTOp  is  defined  as  a  relation  in  W/ 
where  P  e  PP .  <U/v)  e  MAPTOp  implies  that  (P/u)  e  USE/ 
(P/v)  c  MOO  and  v  is  directly  modified  depending  on  the  value 
of  u  in  P  or  some  subcalls  of  P. 


The  direct  logical  ripple  effect  relationship  between  & 

pair  of  var i ab le  occurrences  is  defined  as  follows:  An 

occurrence  of  variable  u  in  instuction  ip  may  impose  direct 

logical  ripple  effect  on  an  occurrence  of  variable  v  in 

instruction  jQ  if  and  only  if  ipc  DEFS(u/jQ>  and 

(u/v)  e  DIRECTMAPTO. .  In  other  words/  the  pair  of  variable 

Q 

definitions  at  two  ends  of  any  use-definition  chain  are  said  to 
have  direct  logical  ripple  effect  from  one  to  the  other. 


The  direct  logical  ripple  effect  relationship  between  a 
pair  of  procedures  is  defined  as  follows:  Procedure  P  may 
impose  direct  logical  ripple  effect  on  procedure  Q  if  and  only 
if  there  exists  at  least  one  variable  occurrence  in  P  which  may 
impose  direct  logical  ripple  effect  on  a  variable  occurrence  in 


DIRECTRIP  is  defined  as  a  relation  in  PP/  such  that 
(P/Q)  i  DIRECTRIP  implies  that  P  may  impose  direct  logical 
ripple  effect  on  Q. 


The  logical  ripple  effect  relationship  between  a  pair  of 
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var  i  ab  le  occurrences  is  defined  as  fol  lotus.  An  occurrence  of 

variable  u  may  impose  logical  ripple  effect  on  an  occurrence  of 

variable  v  if  and  only  if  there  exists  a  sequence  of  variable 

occurrences  x  »x  »...«x  such  that 
12  n 

u  =  x  >  v  =  x  > 

1  n 

and  x.  may  impose  direct  ripple  effect  on  x  for 

1  <.  i  <_  n-1 . 

The  logical  ripple  effect  relationship  betuieen  a  pair  of 
or ocedur es  is  defined  as  follows.  Procedure  P  may  impose 
logical  ripple  effect  on  procedure  O  if  and  only  if  there 
exists  at  least  one  variable  occurrence  in  P  which  may  impose 
logical  ripple  effect  on  a  variable  occurrence  in  Q. 

RIP  is  defined  as  a  relation  in  PP  such  that  (P/Q)  c  RIP 
implies  that  P  may  impose  logical  ripple  effect  on  Q. 


7.6.2  Log i c  a  l  ripple  effect  an  a  1  us i s  for  metr ics  calculation 

We  start  with  the  assumption  that  no  intra-procedur al 
control  flow  information  will  be  taken  into  consideration. 
This  is  to  simulate  the  situation  in  the  program  design  phase/ 
where  procedures  are  often  viewed  as  black  boxes  performing 
certain  functions  on  interface  variables  only.  Therefore/  the 


algorithm  may  also  be  applied  in  the  design  phase. 
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Also.  again  uie  stress  that  the  logical  ripple  effect 
analysis  approach  we  will  present  in  this  section  is  to 
emphasize  the  efficiency  consideration  in  computing  logical 
stability  measure.  Therefore.  the  accuracy  of  the  logical 
ripple  effect  analysis  proposed  in  this  section  is  somewhat 
less  than  that  in  Section  5.  One  way  of  trading  accuracy  with 
efficiency  is  to  ignore  control  flow. 

In  the  following  section,  for  the  sake  of  simplicity,  we 
will  not  consider  mechanisms  that  may  introduce  dynamic 
aliasing  among  variables,  such  as  reference  parameter  passing. 


7. 6. 2.1  No.  Contro  1  Flow  -  No.  Shar  ing 

From  the  definitions  and  the  assumption  that  any 

intra-procedural  execution  sequence  is  possible.  we  can  show 

that  a  procedure  P  may  impose  direct  logical  ripple  effect  on 

procedure  Q  if  and  only  if  there  exists  a  nonempty  set 

ripyar  ^ .  of  variables  such  that  (P.v)  e  directmod. 
(P.Q) 

(Q.v)  «  DIRECTUSE.  and  <P.Q>  e  ( CALL  U  CALLTU  C<P.P>3>*  for 
every  v  «  RIPYAR  .  Hence,  we  can  compute  OIRECTRIP  as 


f o 1  lows : 
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DIRECTRIP  =  (DIRECTMOD  DIRECTUSE1)  n 

(CALL  U  CALLTU  C (P,P) | P€PP3)* .  (7.1) 

Note  that  when  the  call  graph  is  connected)  (7.1)  may  be 
simplified  and  become 

DIRECTRIP  :  DIRECTMOD  DIRECTUSET.  (7.2) 

Then/  for  each  P  in  PP,  we  can  generate  the  sets 

MAPCP3  =  (P  |  (U,V)  €  DIRECTMAPTO  3,  (7.3) 

u  /  v  P 

MAP  =  U  MAPCP3.  (7.4) 

P€PP 

Now#  define  RIPPLE1  as  a  relation  in  MAP  as  follows: 

RIPPLE1  =  C(P  /Q  )  |  (P/Q)  €  DIRECTRIP,  P  ,Q  €  MAP 3 

U / V  V, W  U , V  V/W 

(7.5) 

This  relation  is  essentially  the  combination  of  relation 
DIRECTRIP  with  information  stored  in  relation  MAPTO. 

Finally,  we  can  calculate  the  relation  RIPPLE  in  MAP  by 
the  formula 


* 


RIPPLE 


RIPPLE1 


(7.6) 
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It  can  be  shouin  that  the  logical  ripple  effect  relation 
implied  by  RIPPLE  is  the  most  precise  information  we  can  have 
under  the  assumption  that  i ntr a-pr ocedur a l  control  flow  is  not 
considered  and  no  dynamic  aliasing  condition  among  variables 
may  occur. 

The  relation  DIRECTRIP  can  be  viewed  as  the  first  level 
inter-procedural  ripple  effect  information  while  the  sets 
MAPTOCP3 #  for  all  PePP#  account  for  all  possible 
intr a-procedur al  ripple  effect  information.  The  way  in  which 
RIPPLE  is  going  to  be  derived  in  the  following  sections  is 
analogous  to  the  approach  used  here.  That  is#  after  the 
relation  DIRECTRIP  and  the  sets  MAPCP3  being  established#  the 
relation  ripple  is  calculated  according  to  (7.5)  and  (7.6). 
This  provides  a  unified  form  for  this  approach  which  may  be 
applied  to  both  the  design  and  code  levels.  The  derivation  of 
DIRECTRIP  and  MAPTOCP3  may  be  different  depending  on  various 
cond  i  t  ions . 


7. 6. 2. 2  No.  Contro  1  F  1  ow  -  Sharing 

It  has  been  shown  by  Barth  E BART783  that  the  possible 

aliasing  relationships  among  variables  caused  by 

call-by-reference  parameter  passing  may  be  computed  by  the 

*  *  t 

expression  AFFECT  (AFFECT  )  »  where  the  relation  AFFECT  was 
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defined  to  be  pairs  of  variables  representing  the  formal-actual 
reference  Binding  at  some  point  of  call.  Static  aliasing 
relations  can  De  represented  Dy  a  set  EQU  of  equivalence 
classes  in  a  slightly  different  form  such  that  a  pair  (U/V)  is 
in  EQU  if  and  only  if  u  and  v  are  both  in  the  same  equivalence 
class.  EQU  can  be  initialized  according  to  the  static  aliasing 
conditions  such  as  REDEFINE  in  programming  language  PL/I. 
Thus/  the  aliasing  relation  among  variables  ALIAS  may  be 
computed  by 

ALIAS  =  EQU* AFFECT* ( AFFECT* ) T .  (7.7) 

Noui  uie  can  replace  (7.1)  by 

DIRECTRIP  =  (DIRECTMOD  ALIAS  DIRECTUSE7)  n 

(CALL  U  CALLTU  C (P/P) I PePP J)*  (7.8) 

The  correctness  of  (7.8)  can  be  justified  easily.  Analogous  to 
(7.2)/  uuhen  the  call  graph  is  connected/  (7.8)  can  be 
simplified  and  becomes 

DIRECTRIP  =  DIRECTMOD  ALIAS  DIRECTUSE7.  (7.9) 

The  precision  of  (7.9)  may  be  further  improved  if  it  is 

possible  to  take  into  consideration  the  durations  of  the 

m  *7 

dynamic  aliasing  relations  implied  by  AFFECT  (AFFECT  )  .  (7.3) 
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thru  (7.6)  may  notu  be  applied  to  compute  the  final  matrix 
RIPPLE. 

Note  that  the  basic  relations/  namely  DIRECTMOD/ 
DIRECTUSE/  DIRECTMAPTOz  and  CALL/  which  are  needed  in  the 
algorithm/  are  all  local  information  to  the  procedure  and  so 
can  be  easily  constructed  from  a  design  document. 

The  dominant  factor  in  the  complexity  in  computation  of 
this  algorithm  is  the  amount  of  computation  of  the  relation 
ripple  which  is  the  same  as  the  computation  of  the  transitive 
closure  of  an  array  of  size  | MAP | .  The  time  bound  for 
computing  transitive  closure  of  an  array  of  size  m  is  known  to 

3 

be  smaller  than  the  order  0<m  >.  The  best  result  known  up  to 

date  is  of  the  order  0<mZ'435364>  CCOPP813.  This  bound  depends 

on  the  size  of  the  set  MAP/  which  in  worse  case  can  be  of  the 
Z 

order  0<n  ),  where  n  is  the  length  of  the  program.  But/  this 
is  extremiy  unlikely  in  real  situations.  Actually/  based  on 
some  empirical  data  gathered  from  real  programs/  ocn)  is  a  more 
realistic  estimate  for  the  size  of  MAP.  Therefore/  this  gives 
us  an  algorithm  to  compute  the  total  internal  ripple  effect  of 
any  program  in  a  time  bound  which  is  independent  of  the  total 
number  of  branches  in  the  program.  At  the  same  time/  since 
relations  can  be  represented  as  a  bit-matrix/  the  space  bound 


is  manageable  too. 
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7. 6. 2. 3  Co  n  t  r  o  1  F  1  o  uj  -  Tr  ac  i  nq 

Suppose  that  we  know  the  way  of  solving  the  use -def  i  n  i  t  i  on 
chains  proDlem  interprocedural  ly .  Then  the  problem  becomes 
similar  to  the  tracing  phase  in  Section  5#  and  it  may  be  solved 
in  the  following  manner:  Let  the  set  RIPlp  be 

R I P 1  =  C  Q  I  a  V.i  ,j  .  S.t.  jeDEFS(V. i  ) 3. 

P  P  JQ  Q  P 

RIP1  is  in  fact  a  simplified  variation  of  DEFS.  With  RIPl 
we  can  easily  build  DIRECTRIP  as  follows 

DIRECTRIP  =  C(P.Q)  |  3  V  .  i  .  .i  S.t.  i  €DEFS ( V , j „ ) 3 

P  Q  P  Q 

=  C  ( P  *  Q )  |  PeRIPl  :>.  (7.10) 

Q 

Sets  MAPCP3  may  be  derived  by  the  formula 

hftPCP3  =  CP  |  ( u  >  v )  €  MflPTO_n 

U  #  V  JT 

(DIRECTUSE1  C(P*P)>  DIRECTMOD ) >  (7.11) 

(7.11)  will  select  local  variable  pairs  from  HAPTOp,  and  (7.4) 
thru  (7.6)  may  be  applied  accordingly  to  yield  the  relation 
RIPPLE.  It  can  be  shown  that  the  information  RIPPLE  derived  in 
this  manner  is  precise  given  that  the  summary  information  MOD» 
USE.  MAPTO  and  RIPl  are  all  precise. 


7.6.3  Cone  1  us  ion 


The  technique  presented  here  is  a  someuihat  less  accurate 
approach  for  logical  ripple  effect  analysis  than  that  presented 
in  Section  5.l»  Out  it  is  more  efficient.  It  is  suitable  for 
calculating  software  metrics  because  the  measure  itself  is  only 
an  estimate  of  some  aspect  of  the  software  quality.  Precise 
information  is  welcome/  but  sometimes  it  is  too  expensive  to 
generate.  approximate  information  is  thus  a  practical 
alternative  and  should  not  affect  much  on  the  quality  for 
validating  the  metrics.  Another  advantage  of  this  technique  is 

tnat  it  may  be  applied  in  both  the  design  and  code  levels  using 

tne  same  algorithm.  This  should  cause  substantial  saving  in 
effort  on  constructing  tools  for  validating  the  measures  and 
ensures  the  consistency  between  measures  on  different  levels. 

Aitnough  the  technique  is  imcomplete  in  the  sense  that  an 
efficient  way  of  obtaining  the  set  RIP1  still  needs  to  be 

developed,  it  may  be  used  without  considering  the  control  flow 

•  itnm  modules  which  is  less  precise.  Besides  the  search  for 
an  efficient  way  to  generate  set  RIPl/  more  experiments  are 
needed  to  give  some  empirical  evidence  that  the  measures 
alcuiateo  in  this  manner  do  not  differ  much  from  those  from 


tne  original  computation. 
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7 . 7  Discuss  ion  and  Future  Work 

Metrics  for  the  primary  attributes  uihich  affect  software 
modifiability  are  in  various  stages  of  development  and 
validation.  Metrics  for  logical  stability  have  been  developed 
and  partially  validated.  Metrics  for  performance  stability/ 
module  strength  and  coupling  have  been  defined.  A  framework  for 
efficient  logical  ripple  effect  analysis  approach  at  both  the 
design  and  the  code  levels  has  been  established. 

Future  work  is  needed  to  identify  and  examine  all 
important  software  attributes  which  affect  software 
modifiability  and  reusability#  and  to  develop  a  way  for 
combining  these  attributes  into  quantitative  measures  of 
modifiability  and  reusability.  To  achieve  this  goal,  we  should 
develop  and  validate  the  metrics  related  to  modifiability, 
including  the  metrics  for  performance  stability,  complexity, 
module  strength  and  coupling.  Validation  and  refinement  of  all 
related  metrics,  including  modifiability  itself,  need  to  be 
completed  by  performing  a  series  of  comprehensive  experiments. 
Furthermore,  significant  attributes  related  to  reusability, 
including  portability,  need  to  be  identified  and  examined. 


Page  239 


0.0  REFERENCES 


C AH0723  Aho #  U.  A.  and  Ullman*  J.  D.*  The  Theoru  of  Pars ing. 

Translation  and  Como i 1 i no *  Uo 1 .  1 1  *  Prentice-Hall/ 
Englewood  Cliffs*  New  Jersey*  1972. 

C  ALF0773  Alford*  M.  w . *  "A  Requirements  Engineering 
Methodology  for  Real-Time  Processing  Requirements"* 
IEEE  Tr ans .  on  Software  Engineering*  Uo 1 .  SE-3*  No. 

1*  Jan.  1977*  pp.  60-69. 

C ALLE743  Allen*  F.  E.*  “ Inter procedur a  1  Data  Flow  Analysis"* 
IFIP  74*  North-Ho 1 1  and  Pub.  Co.*  Amsterdam*  1974*  pp. 
398-402. 


CARTH01D  Arthur*  J.  and  Ramanathan*  J.*  "Design  of  Analyzers 
for  Selective  Program  Analysis”*  IEEE  Trans .  on 
Software  Eng ineer  ing*  Uol.  SE-7*  No.  1*  Jan.  1901* 
pp .  39-51 . 

CBALZ693  Balzer,  R.  M.»  "EXDAMS  -  Extendable  Debugging  and 
Monitoring  System"*  Proc .  AFIPS  1969  Spring  Joint 
Computer  Conf . »  1969#  pp.  567-500. 

CBART783  Barth*  J.  M.*  "A  Practical  Interprocedural  Data  Flow 
Analysis  Algorithm"*  Comm.  ACM*  Uol.  21*  No.  9*  Sept. 
1978*  pp.  724-736. 

LBELF77]  Belford*  P.  C.»  Donahoe*  J.  D.  and  Heard*  U.  J.*  "An 
Evaluation  of  the  Effectiveness  of  Software 
Engineering  Techniques"*  D i oest  of  Papers*  COMPCON  77 
(Fall),  pp.  259-269. 

C  BOEH733  Boehm*  B.  W.#  "Software  and  Its  Impact:  A 

Quantitative  Assessment"*  Datamat ion*  May#  1973*  pp. 
40-59. 


[ B0WL83D  Bowles*  A.  J.*  Effects  of  Design  Complexity  on. 

Software  Maintenance*  Ph.D.  Dissertation,  Dept,  of 
Electrical  Engineering  and  Computer  Science* 
Northwestern  University*  June  1983. 


C BOYD783  Boyd*  D.  and  Pizzarello#  A.*  "Introduction  to  the 
WELLMADE  Design  Methodology”,  Proc .  3rd  Int  *  1 ,  Conf . 
on  Software  Engineering*  1978*  pp.  94-100. 


Page  240 


C  BRUN681 

C  CHAP793 

CCLAR763 

CCLAU793 

CCOPP013 

CDEMA783 

CDEME81I 

C  DUNS80] 

C  EJZA823 

CFAIR753 


Brumng,  j.  L.  and  KintZ/  B.  L.z  Computational 
Handbook  of  stat ist ics<  Scott/  Foresman  and  Company/ 
Glenview/  IL/  1968 

Chapin/  N./  “A  Measure  of  Software  Complexity"/  AFIPS 
National  Computer  Conference/  pp.  995-1002/  Spring 
1979. 

Clarke/  L.  A./  "A  System  to  Generate  Test  Data  and 
Symbolically  Execute  Programs"/  IEEE  Tr  ans .  on 
Software  Engineering/  Uo  1  .  SE-2/  No.  3/  Sept.  1976/ 
pp.  215-222. 

Claus/  U./  Ehrig/  H.  and  Rozenberg/  G. / 
Gr aph-Gr ammars  and  Their  App 1 i c  at i on  to  Computer 
Sc  ience  and  B i o 1 oou /  Lecture  Notes  in  Computer 
Science  73/  Spr inger-Uer  lag/  1979. 

Coppersmith/  D.  and  Uinograd/  S.,  "On  the  Asymptotic 
Complexity  of  Matrix  Multiplication  :  Extended 
Summary"/  Proc .  22nd  Annual  Sump .  on  Foundat ions  of 
Computer  Sc ience/  IEEE/  Oct.  1981/  pp.  82-90. 

DeMarco/  T.z  Structured  Analus is  and  Sustem 
Spec  if  icat ion#  Yourdon  Inc./  1978. 

Demers/  A./  Reps/  T.  and  Teitelbaum/  T.,  "Incremental 
Evaluation  for  Attribute  Grammars  with  Application  to 
Syntax-directed  Editors"/  Proc .  8th  ACM  Sump .  on 
Principles  of  Programming  Languages/  1981/  pp. 
105-116. 

Dunsmorez  H.  E.  and  Gannon/  J.  D.z  "Analysis  of  the 
Effects  of  Programming  Factors  on  Programming 
Effort"/  Journal  of  Sustems  and  Software/  1/  pp. 
141-153/  1980. 

Ejzakz  R.  P.  Strength  and  Coup  1  i  no  Metr ics  of 
Software  Structure /  M.S.  Thesis/  Department  of 
Electrical  Engineering  and  Computer  Science/ 
Northwestern  University/  Evanston/  Illinoi/  Auguest/ 
1982. 

Fairley/  R.  E./  "An  Experimental  Program  Testing 
Facility"/  IEEE  Tr  ans .  on  Software  Engineering/  Uo 1 . 
SE-1,  No.  4/  Dec.  1975/  pp.  350-357. 


Page  241 


CFISC773  Fischer#  K.  F.»  "A  Test  Case  Selection  Method  for  the 
Ualidation  of  Software  Maintenance  Modification"# 
Proc .  1st  Int ' 1 .  Conf .  on  Computer  Software  and 
API  1 icat ions  (COMPSAC  77)#  1977,  pp.  421-426. 

CF0SD76D  Fosdick#  L.  D.  and  Osterweil#  L.  J.#  "Data  Flow 

Analysis  in  Software  Reliability”#  ACM  Coroput i no 
Surveus#  Uo  1 .  8#  No.  3#  Sept.  1976#  pp.  305-330. 

C GOOD753  Goodenough#  J.  B.  and  Gerhart#  S.  L.#  "Towards  a 

Theory  of  Test  Data  Selection”#  IEEE  Trans .  on 

Software  Eng  ineer ing#  Uo  1 .  SE-1#  No.  2#  June  1975# 
pp.  156-173. 

CHALL7B3  Hall  in#  T.  and  Hansen#  R.#  "Towards  a  Better  Method 
of  Software  Testing"#  Proc .  2nd  Int  *  1 .  Computer 

Software  and  Add  1 icat ions  Conf .  (COMPSAC  70)#  1978# 

pp.  153-157. 

CHAY743  Hay#  G.  G.»  "Formal  Definition  of  a  Simple  On-line 
Te  leprocessor  in  UDL"#  in  Programming  Sumoos ium# 
Paris  1974#  Lecture  Notes  in  Computer  Science  19# 
Spr inger-Uer 1 ag#  1974. 

C HECH773  Hecht#  M.  S.#  “Flow  Analusis  of  Computer  Programs"# 
North-Ho 1 1  and#  1977. 

C HENI793  Heninger#  K.  L.#  "Specifying  Software  Requirements 
for  Complex  Systems"#  Proc .  Spec  if i cat  ions  of 
Reliable  Software#  1979#  pp.  1-14. 

CHENIB03  Heninger#  K.  L.#  "Specifying  Software  Requirements 
for  Complex  Systems:  New  Techniques  and  Their 
Applications"#  IEEE  Trans .  on  Software  Engineering# 
Uol.  SE-6#  No.  1#  Jan.  1980#  pp.  2-12. 

CH0UD78]  Howden#  W.  E.#  "DISSECT  -  A  Symbolic  Evaluation  and 
Program  Testing  System”#  IEEE  Trans .  on  Software 
Eno  ineer  ino#  Uol.  SE-4#  No.  1#  Jan.  1978#  pp.  70-73. 

C HSIE823  Hsieh#  C.  C.#  An.  Approach  to  Loo ical  Ripple  Effect 
Analusis  for  Software  Maintenance.  Ph .  D. 
Dissertation#  Department  of  Electrical  Engineering 
and  Computer  Science#  Northwestern  University# 
Evanston#  Ill.#  June  1982. 


Page  242 


m 


L- 


K  - 


C ICHB793 


Ichbiah,  J.»  et  al,  "Preliminary  ADA  Reference 
Manual'S  ACM  SIGPLAN  Notices,  Uo  1  .  14,  No.  6,  June 
1979,  Section  5.2.3. 


C  JACK75I 


Jackson,  M.  A.,  Principles  of 
Academic  Press,  1975. 


Program  Des i on , 


I JANS80] 


Janssens,  D.  and  Rozenberg,  G.,  "Node-Label 

Controlled  Graph  Grammars",  Pr oc .  9th  Sump .  on 
Mathematical  Foundations  of  Computer  Science 

Lecture  Notes  i n  Computer  Science  80, 

Spr  i nger-Uer l ag ,  1980. 


CJENS74]  Jensen,  K.  and  Wirth,  N. 


Pascal  User  Manual  and 


Report ,  Spr inger-Uer 1 ag,  New  York,  1974. 


CKAFUB1I 


Kafura,  D.  G.  and  Henry,  s.  M.,  "Software  Quality 
Metrics  Based  on  Interconnectivity",  Journal  of 
Sustems  and  Software ,  Uo 1 .  2,  No.  2,  June  1981,  pp. 
1  c  1  - 1 3 1 . 


. 1 

V  ' 

»„ ' 

C  KING763 

King,  J. 
Test ing", 
385-394. 

C.  , 
Comm . 

‘Symbo  1  i  c 
ACM,  Uol. 

Execut  ion 
19,  No.  7, 

and 

July 

m 

CLEE72I 

Lee,  J. 

Re inho Id, 

A.  N.  , 
1972. 

Computer 

Semant ics. 

Uan 

CLIEN803 

L ientz. 

B.  P. 

and  Swanson,  E. 

B.  , 

Program 


Uan  Nostrand 


Software 


Maintenance  Management ,  Add ison-Wes ley,  1980. 


CL0ME77I 


Lomet,  D.  B.,  "Data  Flow  Analysis  in  the  Presence  of 
Procedure  Calls",  IBM  Journ  a  1  of  Research  and 
Development,  Uo 1 .  21,  No.  6,  Nov.  1977,  pp .  559-571. 


C  MCCA763 


McCabe,  T.  J.,  "A  Complexity  Measure",  IEEE  Trans,  on 
Software  Engineer ing,  Uo 1 .  SE-2,  No.  6,  Dec.  1976, 
p  p .  308-320 . 


CMYER76I 


Myers,  G.  J.,  Software  Reliability:  Pr i ncp 1 es  and 
Pract  ices,  John  Wiley  and  Sons  Inc.,  1976,  pp. 
216-246. 


CMYER78] 


Myers,  G.  J.,  Composite/Structured  Design, 
Nostrand  Reinhold  Company,  New  York,  N.Y.,  1978 


Uan 


CPAGA811 


Pagan,  F.  G. ,  Forma  1  Spec i f i c at i on  of  Programming 
Languages :  A  Panor  am i c  Pr imer ,  Prentice-Hall,  1981. 


Page  243 


CRAMA763  Ramamoorthy,  C.  V.,  Ho »  S.  F.  and  Chen.  W.  T..  "On 
the  Automated  Generation  of  Program  Test  Data".  IEEE 
Trans,  on  Sof tuare  Ena ineer ina.  Vo 1 .  SE-2.  No.  4 

(Dec.  1976),  pp.  293-300. 

CRICH81 3  Richardson,  D.  and  Clarke,  L.  A.,  "A  Partition 
Analysis  Method  to  Increase  Program  Reliability", 
Pr oc  .  5th  Int '  1  .  Conf  .  on  Software  Ena  ineer  i na,  1901, 
pp.  244-253. 

C ROSE793  Rosen,  B.  K.,  "Data  Flow  Analysis  for  Procedural 
Languages",  Journal  of  ACM,  Vo  1 .  26,  No.  2,  April 

1979,  pp.  322-344. 

CROSS773  Ross,  D.  T.  and  Schoman,  K .  E.  Jr.,  "Structured 

Analysis  for  Requirements  Definition",  IEEE  Trans,  on 
Software  Eno ineer ina.  Vo  1 .  SE-3,  No.  1,  Jan.  1977, 
pp.  6-15. 

CSTAY763  Stay,  J.  F.,  "HIPO  and  Integrated  Program  Design", 

IBM  Sustems  Jour n a  1 ,  Vo  1 .  15,  No.  2,  1976,  pp. 

143-154. 

t SWAN763  Swanson,  E.  B.,  "The  Dimensions  of  Maintenance", 

Proc .  2nd  Int  *  1 .  Conf .  on  Software  Eno ineer ina,  1976, 
pp.  492-497. 

CTEIT81I  Teitelbaum,  T.»  Reps,  T.  and  Horwitz,  S.,  "The  Why 
and  Wherefore  of  the  Cornell  Program  Synthesizer", 
ACM  SIGPLAN  Notices,  Vo  1 .  16,  No.  6,  June  1981,  pp. 
8- : 16. 

C WASS803  Wasserman,  A.  I.,  "Testing  and  Verification  Aspects 
of  Pascal-like  Languages",  Tutorial  on  Proar amm i ng 
Language  Design.  IEEE  Computer  Society  Press,  1980, 
pp.  61-75. 

C WASS823  Wasserman,  A.  I.,  "The  Future  of  Programming",  Comm . 
ACM,  Vol.  25,  NO.  3,  March  1982,  pp.  196-206. 

C  WEGN783  Wegner,  P.,  "Research  Directions  in  Software 
Technology",  Proc .  3rd  Int '  1  .  Conf ■  on  Software 
Eng  i neer  i no ,  1978,  pp.  243-263. 

CWEIS813  Weiser,  M.»  "Program  Slicing",  Pr oc ■  5th  Int '  1  .  Conf  . 
on  Software  Eng ineer inq,  1981,  pp.  439-449. 


Page  244 


CMEIS821  Meiser,  M.,  "Programmers  Use  Slices  When  Debugging"/ 
Comm.  ACM/  Uo 1 .  25/  No.  7.  July  1982/  pp.  446-452. 

[ MEYU80 ]  Meyuker,  E.  and  Ostrand/  T./  "Theories  of  Program 
Testing  and  the  Application  of  Revealing  Subdomains"/ 
IEEE  Trans .  on  Software  Engineering/  Uo  1  .  SE-6/  No. 

3,  May  1980,  pp.  236-246. 

CMIRT71]  Mirth,  N.,  "Program  Development  by  Stepwise 
Refinement",  Comm .  ACM,  Uo 1 .  14,  No.  4,  April  1971, 
pp.  221-227. 

CMIRT75]  Mirth,  N.,  "Pascal  -S:  A  Subset  and  its 

Implementation",  Techn ical  Report  12,  Institut  fuer 
Informatik,  ETH  Zuerich,  1975. 

CYAU78]  Yau,  S.  S.,  Collofello,  J.  S.  and  MacGregor,  "Ripple 
Effect  Analysis  for  Software  Maintainance",  Proc .  2nd 
Int  *  1  Conf .  on  Computer  Software  and  App 1 icat ions 
(COMPSAC  78),  1978,  pp .  60-65. 

t YAU80a]  Yau,  S.  S.,  Se 1 f-metr  j c  Software  -  Summary  of 
Technical  Programs ,  Final  Technical  Report 
R ADC- TR— 80-138,  Uo 1 .  I  (of  3),  NTIS  AD-A0386-290, 
Apr i 1 ,  1980. 

C YAU80b 1  Yau,  S.  S.,  Collofello,  J.  S.  and  Hsieh,  C.  C., 

Se 1 f-Metr i c  Software  -  A  Handbook :  Part  l.  Logical 
Rjpp  le  Effect  Analus is.  Final  Technical  Report 
RADC-TR— 80-138,  Uo 1  II  (of  3),  NTIS  AD-A0386-291 , 
April  1980. 

C  YAU80c  U  Yau,  S.  S.  and  Collofello, 

Soft ware  ^  A  Handbook :  Part 
Effect  An  a  1  us i s ,  Final 
RADC-TR-80-138,  Uo 1  III  (of 
April  1980. 

C  YAU80d ]  Yau,  S.  S.  and  Grabow,  P.  C.,  "A  Model  for 

Representing  the  Control  Flow  and  Data  Flow  of 

Program  Modules",  Proc .  4th  Int  *  1  .  Conf .  on  Computer 
Software  and  App 1 i c at i ons  (COMPSAC  80),  1980,  pp. 
153-160. 

C YAU80e 3  Yau,  S.  S.  and  Collofello,  J.  S.,  "Some  Stability 

Measures  for  Software  Maintenance",  IEEE  Trans,  on 
Software  Engineering,  Uo  1  .  SE-6,  No.  6,  Nov.  1980, 
pp.  545-552.  The  Preliminary  version  of  this  paper 
appeared  in  Pr oc .  3rd  Int '  1  Conf .  on  Computer 


J.  S.,  Se lf-Metr  ic 
1 1 ,  Performance  Ripple 
Technical  Report 

3),  NTIS  AD-A0386-Z92, 


Software  and 

pp . 606-611 . 


1 1 icat ions 


(COMPSAC 


Page  245 


1979, 


C YAU80f 3  Yau/  S.  S.  and  Collofello.  J.  S..  Performance  Ripple 
Effect  Analus is  for  Larae-Sc a  1 e  Software  Maintenance/ 
Technical  Report  RADC-TR-80-55,  NT1S  AD-A0304-351/ 
March  1900. 

C  YAU81 a]  Yau/  S.  S.  and  Grabow.  P.  c.#  "A  Model  for 

Representing  Programs  Using  Hierarchical  Graphs"/ 
IEEE  Tr ans  on  Software  Eno ineer ina,  Uo 1 .  SE-7/  No.  6/ 
Now.  1981/  pp.  556-574. 

C YAU81b I  Yau/  S.  5./  CarwalhO/  M.  B.  and  Nicholl/  R.  A./  "A 
Method  for  Estimating  the  Execution  Time  of  Arbitrary 
Paths  in  Programs"/  Proc .  5th  Int  M  Conf .  on  Computer 
Software  and  App 1 icat ions/  (COMPSAC  81)/  1981/  pp. 
225-239. 


C YAU82aI 


CYAU82P3 


C  YAU82C I 


t ZAUE81 I 


Yau/  S.  S./  Chang/  C.  K./  Hsieh/  C.-C./  KishimotO/  Z. 
and  Nicholl/  R.  A./  "A  Methodology  for  Software 
Maintenance"/  Proc .  Int  *  1 .  Computer  Sumoos ium/ 
Taiwan/  1982/  pp.  447-458. 

Yau/  S.  S./  Grabow/  P.  C.  and  Meems.  B.  P..  "A  Binary 
Representation  for  the  Hierarchical  Program  Model"/ 
Proc .  6th  Int  *  1  Conf .  on  Computer  Software  and 
Applications.  (COMPSAC  82).  1982/  pp.  188-195. 

Yau.  S.  S.  and  Collofello.  J.  S./  "Design  Stability 
Measures  for  Software  Maintenance".  Proc .  6th  Int ' 1 . 
Conf .  on  Computer  Software  and  App 1 icat ions  (COMPSAC 
82).  1982.  pp.  100-108. 

Zawe.  P.  and  Yeh.  R.  T ./  "Executable  Requirements  for 
Embedded  Systems".  Proc .  5th  Int '  1  ■  Conf.  on  Software 
Ena  ineer inq.  1981.  pp.  295-304. 

Zawe.  P.z  "An  Operational  Approach  to  Requirements 
Specification  for  Embedded  Systems".  IEEE  Tr  ans  on 
Software  Eno  ineer  ina.  Uo  1 .  SE-8.  No.  3.  May  1982.  pp. 
250-269. 


M./  "Perspect i wes  on 

ACM  Computing  Surweus .  Uo 1 . 
197-216. 


Software 
10.  No.  2. 


9.0  PUBLICATIONS  AND  PRESENTATIONS 


Besides  the  results  of  the  research  presented  in  this 
report#  many  results  have  already  been  published  or  presented 
in  preliminary  or  complete  forms.  The  publications  and 
presentations  are  grouped  in  the  following  categories:  (1) 
papers/  (2)  technical  reports/  (3)  presentations  related  to  the 
project/  and  (4)  Ph.D.  dissertations  and  M.S.  theses. 


9. 1  Papers 


1.  S.  S.  Yau  and  J.  S.  Collofello/  "Some  Stability  Measures 

for  Software  Maintenance"/  IEEE  Trans .  on  Software 

Eng ineer ina/  Uo 1 .  SE-6,  No.  6/  Nov.  1900/  pp.  545-552. 

2.  S.  S.  Yauz  M.  B.  Carvalho  and  R.  A.  Nicholl/  "A  Method  for 
Estimating  the  Execution  Time  of  Arbitrary  Paths  in 
Programs"/  Proc .  5th  Int  *  1 .  Conf  .  on  Computer  Software  and 
App 1 jcations  (COMPSAC  91),  1981,  pp.  225-239. 

3.  S.  S.  Yau  and  J.  S.  Collofello,  "Some  Design  Stability 

Measures  for  Software  Maintenance”,  Proc .  6th  Int  *  1 .  Conf . 
on  Computer  Software  and  Aoo 1 icat ions  (COMPSAC  02),  1902, 

pp.  100-100. 

4.  S.  S.  Yau,  C.  K.  Chang,  C.-C.  Hsieh,  Z.  Kishimoto  and  R.  A. 

Nicholl,  "A  Methodology  for  Software  Maintenance",  Proc . 
Int ' 1 ,  Computer  Sumpos ium,  Taiwan,  December  15-17,  1992, 

pp.  447-450. 

5.  S.  S.  Yau  and  C.  C.  Hsieh,  "Ripple  Effect  Analysis  for 
Large-Scale  Software  Maintenance  I  -  Logical  Ripple  Effect 
Analysis",  submitted  for  publication. 

6.  S.  S.  Yau,  J.  S.  Collofello  and  R.  A.  Nicholl,  "Ripple 
Effect  Analysis  for  Large-Scale  Software  Maintenance  II  - 
Performance  Ripple  Effect  Analysis",  submitted  for 
pub  1  icat  ion  . 


Page  247 


S.  S.  Yau,  C , 
Increment  a  1 
pub  1  i  c  at  i  on . 


K.  Chang 
Program 


and  R.  A.  Nicho  1  1 
Modification", 


'An  Approach  to 
submitted  for 


S.  S.  Yau  and  2.  Kishimoto,  "A  Method  for  Revalidating 
Programs  in  the  Maintenance  Phase  -  Module  Testing", 
submitted  for  publication. 


9 . 2  Presentations 


1.  *  S.  S.  Yau,  "Methodologies  for  Large-Scale  Software 
Maintenance",  Seminar,  Be  1 1  Telephone  Laborator ies, 
Naperville,  Illinois,  July  1,  1980. 

2.  S.  S.  Yau,  "Performance  Stability  Measures  for  Software 

Maintenance",  3rd  Minnowbrook  Workshop  on  Software 

Performance  Evaluation,  Blue  Mountain  Lake,  New  York, 
August  19-21,  1980. 

3.  *  S.  S.  Yau,  "Methodologies  for  Distributed  Computing 

System  Software  Design",  Seminar,  Fu  i  itsu  Labor ator ies , 
Kanagawa-Ken ,  Japan,  October  9,  1900. 

4.  *  S.  S.  Yau.  "Methodologies  for  Large-Scale  Software 

Maintenance,  Seminar,  H i t  ach i  Sustems  Eng i neer i no  Co . , 
Yokohama,  Japan,  October  13,  1980. 

5.  *  S.  S.  Yau,  "A  Model  for  Representing  the  Control  Flow  and 
Data  Flow  of  Program  Modules",  COMPSAC  80,  Chicago, 
Illinois.  October  27-31,  1980. 

6.  *  S.  S.  Yau,  "Critical  Problem  Areas  in  software 

Development",  Techn i c  a  1  Keunote  Speech ,  Int  M .  Computer 
Sumpos i urn  80,  Taipei,  Taiwan,  China,  December  16-18,  1980. 

7.  *  S.  S.  Yau.  "Methodologies  for  Large-Scale  Software 

Maintenance",  Seminar,  Computer  Science  Division, 
Department  of  Electrical  Engineering  and  Computer  Science, 
Un i vers i tu  of  Cal i f orn i a  at  Ber ke 1 eu ,  February  25,  1981. 

8.  S.  S.  Yau,  "A  Semantic  Program  Model  for  Software 
Maintenance",  4th  Minnowbrook  Workshop  on  Software 
Performance  Eva  1 u at i on ,  Blue  Mountain  Lake,  August  11-13, 
New  York,  1901. 


Page  240 


9.  R.  A.  Nicholl*  "A  Method  for  Estimating  the  Execution  Time 
of  Arbitrary  Paths  in  Programs"*  COMPSAC  B 1 >  Chicago* 
Illinois*  November  18-20*  1981. 

10.  J.  S.  Collofello*  "Some  Design  Stability  Measures  for 

Software  Maintenance"*  COMPSAC  82*  Chicago*  Illinois* 
November  10-12*  1982. 

11.  *C .  K.  Chang*  "A  Methodology  for  Software  Maintenance"* 
Int ' 1 .  Computer  Sumpos i urn*  Taiwan*  December  15-17*  1982* 
pp.  447-458. 


*  These  presentations  and  participation  were  made  at  no  cost  to 
the  contract. 


9 . 3  Techn i ca  1  Reports 

S.  S.  Yau*  Methodology  for  Software  Maintenance*  RADC 
Interim  Report,  July,  1981. 

9.4  Disser tat  ion  And  Theses 

A  number  of  graduate  students*  who  have  worked  on  this 
contract*  completed  their  Ph.D.  and  M.S  degrees  in  the 
Department  of  Electrical  Engineering  and  Computer  Science* 
Northwestern  University.  Their  Ph.D.  dissertations  and  M.S. 
thesis  are  listed  below: 

1.  C.  C.  Hsieh,  Logical  R i pp 1 e  Effect  An  a  1  us  i  s  for  Program 
Modification*  M.S.  Thesis*  June*  1980. 

2.  Z.  Kishimoto*  Testing  for  Laroe-Sc a  1 e  Programs  in  the 
Maintenance  Phase*  Ph.D.  Dissertation*  June*  1982. 


Page  249 


C.  C.  Hsieh/  An.  Approach  to  Logical  Ripple  Effect  An  a  1  us  i  s 
for  Sof tuiare  Maintenance/  Ph.D.  Dissertation/  June/  1982. 

C.  K.  Chang/  Incremental  Modification  of  Computer  Programs/ 


Page  250 


10.0  TECHNICAL  PERSONNEL 

During  the  period  of  this  study/  the  following 
Northwestern  University  faculty  and  graduate  students 
contributed  to  the  research  effort  of  this  contract: 


Principal  Investigator 
and  Pro  iect  Director 

Stephen  S.  Yau 

Gr adu ate  Students 


Z.  Kishimoto 
C.  C.  Hsieh 

B.  P.  Weems 

C.  K.  Chang 

R.  A.  Nicho  1  1 

S.  C.  Chang 
R.  E.  Ejzak 
Y.  C.  Chou 
R.  S.  Wang 


1980  1981  1982 


Starting 
April  23 


Ending 
Nov.  30 


XXX 


X 

X 

X 

X 

X 


X 

X 

X 

X 

X 

X 

X 

X 


X 

X 

X 

X 

X 

X 


In  addition#  Professor  J.  S.  Collofello  of  Arizona  State 
University#  who  worked  on  the  previous  project#  continued  to 
serve  as  a  consultant  to  this  contract  for  the  work  in  the 
areas  of  software  metrics  and  performance  ripple  effect 
analysis.  Professor  L.  Clarke  of  the  University  of 
Massachusetts  served  as  a  consultant  in  the  area  of  testing. 


0  APPENDIX 


t  o'  tne  sake  of  completeness,  uie  include  the  following 
r  puDiisneo  papers  u/hich  contain  some  of  the  research 
’s  supported  by  this  contract: 


s  vau  and  J.  S.  Collofello,  "Some  Stability  Measures 
•  ;,r-  Software  Maintenance"/  IEEE  Trans .  on  Software 
Engineering/  Uo 1 .  SE-6,  Ho.  6,  Nou.  1980#  pp.  545—552. 

S  s  Yau.  m.  B.  Carvalho  and  R.  A.  Nicholl,  "A  Method  for 
t  st  i  in  a  t  i  n  q  the  Execution  Time  of  Arbitrary  Paths  in 
r '  j grams  Pr oc .  5th  Int  *  1  .  Conf ■  on  Computer  Software  and 
Op;;  1 c  at  1  on s  (COMPSAC  81).  1981,  pp.  225-239. 

3  s  v  a  u  and  J.  S.  Collofello,  "Some  Design  Stability 
Me  as.jr  es  for  software  Maintenance",  Proc  .  6th  Int  *  1 .  Conf  . 
on  Computer  Software  and  Applications  (COMPSAC  82).  1982, 

pp  100  108. 

s  s  rau.  C.  K.  Chang,  C.  C.  Hsieh,  2.  Kishimoto  and  R.  A. 
n  ,  c n o  i  l  >  "A  Methodology  for  Software  Maintenance",  Proc . 
:  r,  t  •  i  .  Computer  Sumpos  i urn.  Taiwan,  December  15-17,  1982, 

p  p  4  a  7  -  4  58 . 


Some  Stability  Measures  for  Software  Maintenance 


ST l  PHI  N  S  VAL'.  ill  low.  1 1  i  i  .  \ni>  JAMPS  S  COLLOM  I  LO.  mi  miu  k  in  i 


Abstrac t -Software  maintenance  is  the  dominant  factor  contributing 
to  the  high  cost  of  software.  In  this  paper,  the  software  maintenance 
process  and  the  important  software  qualit)  attributes  that  affect  the 
maintenance  effort  are  discussed.  One  of  the  most  important  qualit) 
attributes  of  software  maintainability  is  the  stability  of  a  program, 
which  indicates  the  resistance  to  the  potential  ripple  effect  that  the 
program  would  have  when  it  is  modified.  Measures  for  estimating  the 
stability  of  a  program  and  the  modules  of  which  the  program  is  com 
posed  are  presented,  and  an  algorithm  for  computing  these*  stability 
measures  is  given.  An  algorithm  for  normalizing  these  measures  is  also 
given.  Applications  of  these  measures  during  the  maintenance  phase 
are  discussed  along  with  an  evample.  An  indirect  validation  of  these 
stability  measures  is  also  given.  Future  research  efforts  involving  ap 
plication  of  these  measures  dunng  the  design  phase,  program  restruc¬ 
turing  based  on  these  measures,  and  the  development  of  an  overall 
maintainability  measure  are  also  discussed. 

Index  Terms  -  Algorithms,  applications,  logical  stability,  module- 
stability.  maintenance  process,  normalization,  potential  ripple  effect, 
program  stability,  software  maintenance,  software  quality  attributes, 
validation. 

I  Ini kont  t  i ion 

T  IS  well  known  that  the  cost  til' 'large-scale  software  sys¬ 
tems  has  become  unacceptably  high  [l|.[d|.  Much  of  this 
excessive  software  cost  can  be  attributed  to  the  lack  of  mean- 
ingful  measures  of  software  In  fact,  the  definition  of  software 
quality  is  vers  vague  Since  some  desired  attributes  of  a  pro¬ 
gram  can  only  be  acquired  at  the  expense  of  other  attributes, 
program  quality  must  be  environment  dependent.  Thus,  it  is 
impossible  to  establish  a  single  figure  for  software  quality . 
Instead,  meaningful  attributes  which  contribute  to  software 
quality  must  be  identified  Research  results  in  this  area  have 
contributed  to  the  definition  of  several  software  quality  at¬ 
tributes.  such  as  correctness,  flexibility .  portability,  effi¬ 
ciency.  reliability,  integrity,  testability ,  and  maintainability 
[d|-[6].  These  results  are  encouraging  and  provide  a  reason¬ 
ably  strong  basis  for  the  definition  of  the  quality  of  software 

Since  software  quality  is  environment  dependent,  some  at¬ 
tributes  may  he  more  desirable  than  others.  One  attribute 
which  is  almost  always  desirable  except  in  very  limited  applica¬ 
tions  is  the  maintainability  of  the  program.  Software  mainte¬ 
nance  is  a  very  broad  activity  that  includes  error  corrections. 
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enhancements  of  capabilities,  deletion  of  obsolete  capabilities, 
and  optimization  1 7 1  The  cost  ot  these  sol t ware  maintenance 
activities  has  been  very  high,  and  it  has  been  estimated  ranging 
from  40  percent  |l]  to  <>7  petceni  |2]  of  the  total  cost  during 
the  life  cycle  ot  large-scale  soltwaie  systems.  This  very  high 
software  maintenance  cost  suggests  that  the  maintainability  of 
a  program  is  a  very  critical  software  quality  attribute.  Measures 
are  needed  to  evaluate  the  maintainability  of  a  program  at 
each  phase  ol  its  development.  These  measures  must  be  easily 
calculated  and  subject  to  validation  Techniques  must  also  he 
developed  to  restructure  the  software  during  each  phase  of  its 
development  in  order  to  improve  its  maintainability. 

In  this  papei.  we  will  first  discuss  the  software  maintenance 
process  and  the  software  quality  attributes  that  affect  the 
maintenance  effort.  Because  accommodating  the  ripple  effect 
of  modifications  in  a  program  is  normally  a  large  portion  of 
the  maintenance  effort,  especially  for  not  well  designed  pro¬ 
grams  1 7 1 .  we  will  present  some  measures  for  estimating  the 
stability  of  a  program,  which  is  the  quality  attribute  indicating 
the  resistance  to  the  potential  ripple  effect  which  a  program 
would  have  when  it  is  modified.  Algorithms  for  computing 
these  stability  measures  and  for  normalizing  them  will  be 
given.  Applications  of  these  measures  during  the  maintenance 
phase  along  with  an  example  are  also  presented.  Future  re¬ 
search  efforts  involving  the  application  of  these  measures 
during  the  design  phase,  program  restructuring  based  on  these 
measures,  and  the  development  of  an  overall  maintainability 
measure  are  also  discussed. 

II  Tin  Mainiinanci  Prociss 

As  previously  discussed,  software  maintenance  is  a  very 
broad  activity.  Once  a  particular  maintenance  objective 
is  established,  the  maintenance  personnel  must  first  under¬ 
stand  what  they  arc  to  modify.  They  must  then  modify  the 
program  to  satisfy  the  maintenance  objectives.  Alter  modi¬ 
fication,  they  must  ensuu-  i a i  the  modification  does  not 
affect  other  portions  of  the  program.  Finally,  they  must  test 
the  program.  These  activities  can  be  accomplished  m  the 
four  phases  as  shown  in  Trig.  1 . 

The  first  phase  consists  ol  analyzing  the  piogram  in  oulci  to 
understand  it  Several  attributes  such  as  the  complexity  ot  the 
program,  the  documentation,  and  the  self- descript iv eness  of 
the  program  contribute  to  the  ease  of  understanding  the  pro¬ 
claim  The  complexity  ol  the  program  is  a  measure  ot  the  el- 
fort  required  to  understand  the  program  and  is  usually  based 
on  the  control  or  data  flow  of  the  program.  T lie  sell  itcscrip 
tn  emxs  of  the  program  is  a  measure  of  how  clear  the  program 
is.  i  e  .  how  easy  it  is  to  lead .  understand .  and  use  [  5  ] 

The  second  phase  consists  of  generating  a  particular  inanite 
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nance  proposal  to  accomplish  the  implementation  of  the  main¬ 
tenance  oh|ective.  Tins  requires  a  clear  understanding  of  both 
the  maintenance  objective  anil  the  program  to  he  modified 
However,  the  ease  of  generating  maintenance  proposals  for  a 
program  is  primarily  affected  by  the  attribute  extensibility  The 
ex  tensihil  1 1  v  ol  the  program  is  a  measure  of  the  extent  towhich 
the  program  can  support  extensions  of  critical  functions  [5|. 

The  third  phase  consists  of  accounting  tor  all  of  the  ripple 
effect  as  a  consequence  of  program  modifications.  In  soft¬ 
ware.  the  effect  of  a  modification  mat  not  be  local  to  the 
modification,  but  mat  also  affect  other  portions  of  the  pro¬ 
gram.  There  is  a  ripple  effect  from  the  location  of  the  modi¬ 
fication  to  the  other  parts  of  the  programs  that  are  affected 
by  the  modification  [7|  One  aspect  of  this  ripple  effect  is 
iogical  or  functional  in  nature.  Anothei  aspect  of  this  ripple 
etlect  concerns  the  pertotmame  ol  the  program.  Since  a 
large-scale  program  usually  has  both  functional  and  perfor¬ 
mance  requirements,  it  is  necessary  to  understand  the  poten¬ 
tial  effect  ot  a  program  modification  from  both  a  logical  and 
a  performance  point  ot  view  |7|  I'lie  primary  altnbute  at¬ 
testing  the  ripple  elteO  as  a  consequence  of  a  piogram  mod¬ 
ification  is  the  \tabihti  o ;  the  program.  Program  stability  is 
delmed  as  the  lesisiame  to  the  ainphtK  atioti  of  changes  m 
the  po  .eiatti 

lire  fomtli  phase  consists  ot  testing  the  modified  piogram 
to  ensure  that  the  modified  progiam  has  at  least  I  he  same  ti¬ 
ll. dull!-.  level  as  hetoie  h  is  important  th.it  cosl-elloitive 


testing  techniques  be  applied  during  maintenance  The  pi r 
niary  factor  contributing  to  the  development  of  these  cost- 
effective  techniques  is  the  testability  ol  the  piogiam  Piu 
gram  testability  is  defined  as  a  measuie  ot  the  effoil  required 
to  adequately  test  the  program  according  to  some  well  defined 
testing  crilenon. 

Mach  of  these  lout  phases  and  their  associated  software 
quality  attributes  are  critical  to  the  maintenance  process  All 
of  these  software  quality  attributes  must  he  combined  to  form 
a  maintainability  measure.  One  of  the  most  important  quality 
attributes  is  the  stability  of  the  program.  This  fact  can  he  il¬ 
lustrated  by  considering  a  program  which  is  easy  to  under¬ 
stand.  easy  to  generate  modification  proposals  for,  and  easy 
to  test.  If  the  stability  of  the  program  is  poor,  however,  the 
impact  of  any  modification  on  the  program  is  large.  Hence, 
the  maintenance  cost  will  be  high  and  the  reliability  may  also 
suffer  due  to  the  introduction  of  possible  new  errors  because 
of  the  extensive  changes  that  have  to  be  made. 

Although  the  potential  benefits  of  a  validated  program  sta¬ 
bility  measure  are  great,  very  little  reseaich  has  been  conducted 
in  this  area.  Previous  stability  measures  have  been  developed 
by  Soong  |d|.  Haney  |6).  and  Myers  |4|.  There  exist  sevetal 
weaknesses  in  these  measures  which  have  prevented  their  wide 
acceptance.  Their  largest  problem  has  been  the  inability  to 
validate  the  measures  due  to  model  inputs  that  arc  question¬ 
able  or  difficult  to  obtain.  Other  weaknesses  ot'  these  mea¬ 
sures  include  an  assumption  that  all  modifications  to  a  module 
have  the  same  ripple  effect,  a  symmetry  assumption  that  it 
there  exists  a  non/ero  probability  of  having  to  change  a  mod¬ 
ule  i  given  that  module  /  is  changing  then  there  exists  a  non 
zero  probability  ofbaving  to  change  module/ given  that  mod¬ 
ule  i  is  changing,  and  a  failure  to  incorporate  a  performance 
component  as  part  of  the  stability  measure. 

111.  I)i  vi  i  ortvii  n  i  ni  Lock  ai  SiAWim  Mi  am  to  s 

The  stability  of  a  piogram  has  been  defined  as  the  icsistance 
to  the  potential  ripple  effect  that  the  program  would  have 
when  it  is  modified.  Befmo  considering  the  stability  of  a 
program,  it  is  necessary  to  develop  a  measure  for  the  stability 
of  a  module.  The  stability  of  a  module  can  be  defined  as  a 
measure  of  the  resistance  to  the  potential  tipple  effect  ol  a 
modification  of  the  module  on  other  modules  in  the  ptogtam. 
There  are  two  aspects  of  the  stability  ol  a  module  the  logical 
aspect  and  the  performance  aspect.  The  luitnal  stability  ol  a 
module  is  a  measure  ot  the  resistance  to  the  impact  of  such  a 
modification  on  other  modules  in  the  piogram  in  terms  ol 
logical  considerations.  The  performance  stability  of  a  module 
is  a  measure  of  the  resistance  to  the  impact  ot  such  a  modifica¬ 
tion  on  other  modules  in  the  piogram  in  terms  of  performance 
considerations  In  this  paper,  logical  stability  measures  w  ill  he 
developed  for  a  program  am!  the  modules  of  which  the  pro¬ 
gram  is  composed  I’eilormance  stability  measures  are  cur¬ 
rently  under  development  and  the  results  will  be  reported  in 
a  subsequent  paper  both  the  logical  and  the  performance 
stability  measures  are  being  developed  to  ovetcome  the  weak¬ 
nesses  of  the  previous  stabthtv  measures  In  addition,  the  sta¬ 
bility  measures  ate  being  developed  with  the  lollow  me  tequiic - 
tuents  to  rnciease  then  applu  ability  and  acceptance 


1 )  ability  to  validate  the  measures. 

2)  consistency  with  current  design  methodologies, 

3)  utilization  in  comparing  alternate  designs,  and 

4)  diagnostic  ability. 

It  should  be  noted  that  the  stability  measures  being  described 
aie  not  in  themselves  indicators  of  program  maintainability. 
As  previously  mentioned,  program  stability  is  a  significant 
factor  contributing  to  program  maintainability.  Although  the 
measure^  being  described  estimate  program  stability,  they 
must  be  utilized  in  conjunction  with  the  other  attributes 
affecting  ptogtam  maintainability.  For  example,  a  single  mod¬ 
ule  ptogiam  ot  20  000  statements  will  possess  an  excellent 
program  stability  since  there  cannot  be  any  ripple  effect  among 
modules,  however,  the  maintainability  of  the  program  will 
probably  be  quite  poor, 

Dficltipiiwiu  iij  a  Umiak’  Logical  Stability  Measure 

The  logical  stability  of  a  module  is  a  measure  of  the  resis¬ 
tance  to  the  expected  impact  of  a  modification  to  the  module 
mi  olhei  modules  in  the  program  in  terms  of  logical  considera¬ 
tions  Thus,  a  computation  of  the  logical  stability  of  a  module 
must  be  based  upon  some  type  of  analysis  of  the  maintenance 
activity  which  will  be  performed  on  the  module.  However, 
due  to  the  diverse  and  almost  random  nature  of  software 
maintenance  activities,  it  is  virtually  meaningless  to  attempt 
to  predict  when  the  next  maintenance  activity  will  occur  and 
what  this  activity  will  consist  of.  Thus,  it  is  impossible  to  de¬ 
velop  a  stability  measure  based  upon  probabilities  of  what  the 
maintenance  effort  will  consist  of.  Instead,  the  stability  mea¬ 
sure  must  he  based  upon  some  subset  of  maintenance  activity 
for  which  the  impact  of  the  modifications  can  readily  be  deter¬ 
mined.  Tor  this  purpose,  a  primitive  subset  of  the  mainte¬ 
nance  activity  is  utilized.  This  consists  of  a  change  to  a  single 
variable  definition  in  a  module.  This  primitive  subset  of  main¬ 
tenance  activity  is  utilized  because  regardless  of  the  complex¬ 
ity  of  the  maintenance  activity,  it  basically  consists  of  modifi¬ 
cations  to  variables  in  the  modules.  A  logical  stability  measure 
can  then  he  computed  based  upon  the  impact  of  these  primi¬ 
tive  mollifications  on  the  program.  This  logical  stability  mea¬ 
sure  will  accurately  predict  the  impact  of  these  primitive  mod- 
iff  alums  on  the  program  and,  thus,  can  be  utilized  to  compute 
the  logical  stability  of  the  module  with  respect  to  the  primi¬ 
tive  mi  uhlii  alions. 

Due  lo  the  nature  ol  the  logical  stability  of  a  module,  an 
analysis  ol  the  potential  logical  ripple  effect  in  the  program 
must  be  c ond lie  led.  There  arc  two  aspects  of  the  logical  ripple 
ei'e.  I  win  b  must  he  examined.  One  aspect  concerns  intra- 
;  iodide  chance  pn  pac.itioii.  I  his  involves  the  (low  of  program 
I'Hilvs  vviihin  the  module  as  a  consequence  <>l  the  modifica¬ 
tion  I  he  otliei  .op.  1 1  loiicems  inlctmodule  change  propaga¬ 
tion  Ih i  -  involves  i he  flow  ol  ptogiam  changes  across  module 
1  >  i M.i  uio  ,o  a  ,  onse.| uein. ■.  ol  the  modification . 

In'i.tnio.,!,!!..  ,  h.mc.e  propagation  is  utilized  to  identify  the 

'  / 1 .  ol  innit.ice  vaii.rt'les  which  aie  affected  by  logical 
ripple  fie.  i  a  ■  ■  ..  >nsei|iiciii •  ■  ol  a  modification  to  variable 
definition  /  in  ir.odiile  k  I  ho  ie.|une'  an  identification  of 
w';i  1  v  i r  1,24-  c  constitute  the  module's  interfaces  and  a 
s  I .  i :  i,  '■  ii.-.ition  ,,i  r > ; c-  poienti.il  intraiuodulc  change  propa¬ 
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gation  among  the  variables  in  the  module.  The  variables  that 
constitute  the  module’s  interfaces  consist  of  its  global  vari¬ 
ables,  its  output  parameters  and  its  variables  utilized  as  input 
parameters  to  called  modules  Fach  utilization  of  a  variable 
as  an  input  parameter  to  a  called  module  is  regarded  as  a 
unique  interface  variable.  Thus,  if  variable  x  is  utilized  as  an 
input  parameter  in  two  module  invocations,  then  each  occur¬ 
rence  of  x  is  regarded  as  a  unique  interface  variable.  Each 
occurrence  must  be  regarded  as  a  separate  interface  variable 
since  the  complexity  of  affecting  each  occurrence  of  the 
variable  as  well  as  the  probability  of  affecting  each  occur¬ 
rence  may  differ. 

Once  an  interface  variable  is  affected,  the  flow  of  program 
changes  may  cross  module  boundaries  and  affect  other  mod¬ 
ules.  Intermodule  change  propagation  is  then  utilized  to  com¬ 
pute  the  set  Xkl  consisting  of  the  set  of  modules  involved  in 
intermodule  change  propagation  as  a  consequence  of  affecting 
interface  variable  j  of  module  k.  In  the  worst  case  logical  rip¬ 
ple  effect  analysis,  Xk/  is  calculated  by  first  identifying  all  the 
modules  for  which  j  is  an  input  parameter  or  global  variable. 
Then,  fot  each  of  these  modules  in  Xki.  the  inlramodule 
change  propagation  emulating  from  /  is  traced  to  the  interface 
variables  within  the  module.  Intermodule  change  propagation 
is  then  utilized  to  identify  other  modules  affected  and  these 
are  added  to  Xkl-.  This  continues  until  the  ripple  effect  termi¬ 
nates  or  no  new  modules  can  be  added  to  Xk/-.  An  algorithm 
for  performing  this  worst  case  ripple  effect  has  already  been 
developed  [7],  [8], 

The  worst  case  ripple  effect  tracing  can  significantly  be  re¬ 
fined  if  explicit  assumptions  exist  for  each  module  in  the  pro¬ 
gram  for  its  input  parameters  or  global  variables.  Intermodule 
change  propagation  tracing  would  then  examine  if  a  module's 
assumptions  have  been  violated  to  determine  whether  it  should 
become  a  part  of  the  change  propagation.  If  a  module's  as¬ 
sumptions  have  not  been  violated,  then  the  ripple  effect  will 
not  affect  the  module. 

There  are  many  possible  approaches  to  refining  the  worst 
case  ripple  effect  which  would  not  require  a  complete  set  of 
assumptions  made  for  each  interface  variable  for  every  mod¬ 
ule.  For  example,  a  significant  refinement  to  the  worst  case 
change  propagation  can  result  by  utilizing  the  simple  ap¬ 
proach  of  examining  whether  or  not  a  module  makes  any 
assumptions  about  the  values  of  its  interface  variables.  These 
assumptions  can  be  expressed  as  program  assertions.  If  it 
does  not  make  any  assumptions  about  the  values  of  its  inter¬ 
face  variables,  then  the  module  cannot  be  affected  by  inter¬ 
module  change  propagation.  However,  if  it  does  make  an 
assumption  about  the  value  of  an  interface  variable,  then 
the  woist  case  is  automatically  in  effect  and  the  module  is 
placed  in  the  change  propagation  resulting  from  affecting 
the  interface  variable  if  the  interface  variable  is  also  in  the 
change  propagation  as  a  consequence  of  some  modification. 

Both  intramoduie  and  intermodule  change  piopagation 
must  he  utilized  to  compute  the  expected  impact  ot  a  primi¬ 
tive  modification  to  a  module  on  other  modules  in  the  pro¬ 
gram  A  measure  is  needed  to  evaluate  the  magnitude  of  this 
logical  ripple  effect  which  occurs  as  a  consequence  ol  modify  ¬ 
ing  a  variable  definition  This  measure  must  he  associated 
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contains  a  unique  entry  for  each  definition.  The  set  l'k  is 
created  by  scanning  the  source  code  of  module  A  and  adding 
variables  which  satisfy  any  of  the  following  criteria  to  L*  . 

a)  The  variable  is  defined  in  an  assignment  statement. 

b)  The  variable  is  assigned  a  value  which  is  read  as  input. 

c)  The  variable  is  an  input  parameter  to  module  A. 

d)  The  variable  is  an  output  parameter  from  availed  module. 

e)  The  variable  is  a  global  variable. 

Step  2:  For  each  module  k,  identify  the  set  Tk  of  all  inter¬ 
face  variables  in  module  k.  The  set  Tk  is  created  by  scanning 
the  source  code  of  module  k  and  adding  variables  which  satisfy 
any  of  the  following  criteria  to  Tk . 

a)  The  variable  is  a  global  variable. 

b)  The  variable  is  an  input  parameter  to  a  called  module. 
Each  utilization  of  a  variable  as  an  input  parameter  to  a  called 
module  is  regarded  as  a  unique  interface  variable.  Thus,  if 
variable  x  is  utilized  as  an  input  parameter  in  two  module 
invocations,  then  each  occurrence  of  x  is  regarded  as  a  unique 
interface  variable. 

c)  The  variable  is  an  output  parameter  of  module  k. 

Step  3:  For  each  variable  definition  i  in  every  module  k, 
compute  the  set  Zki  of  interface  variables  in  Tk  which  are 
affected  by  a  modification  to  variable  definition  /  of  mod¬ 
ule  k  by  intramodule  change  propagation  [7|,  (8 ) . 

Step  4:  For  each  interface  variable  /  in  every  module  k. 
compute  the  set  Xkj  consisting  of  the  modules  in  intermod¬ 
ule  change  propagation  as  a  consequence  of  affecting  inter¬ 
face  variable  /  of  module  k. 

Step  5:  For  each  variable  definition  i  in  every  module  k, 
compute  the  set  Wki  consisting  of  the  set  of  modules  involved 
in  intermodule  change  propagation  as  a  consequence  of 
modifying  variable  definition  i  of  module  A  Wkj  is  formed 
as  follows: 

*'*,=  U  Xki. 

’■  /ki 

Step  t>:  For  each  variable  definition  i,  in  every  module  k. 
compute  LCM*,  as  follows: 

LCM*,=  V  C, 

'■  "hi 

where  C,  is  the  McCabe's  complexity  measure  of  module  t. 

Step  7  For  each  variable  definition  i  in  every  module  k. 
compute  the  probability  that  a  particular  variable  definition 
i  of  module  k  will  be  selected  for  modification,  denoted  by 
P(ki ).  as  follows 

/’(At)  I  ( the  number  of  elements  m  1^) 

Step  .V  For  each  module  k.  compute  LRFA  and  LSA  as 
follows 

I  R!  a  L  f/’lAt )  L.CMjoI 

>'  •  A 

IS*  =  I  T-RI  a 

Step  V  Compute  I. RI  P  and  ISP  as  follows 

TRI  P  =  £  | mi  LRF*  ] 

A  l 


where  P{k  I  =  1 and  n  is  the  numbei  of  modules  m  the  pro 

gram.  Then 

LSP  =  1/LREP 

V.  Applications  oi  thi  Lock  ai.  Si \»ii  i  is 
Mlasckks 

The  logical  stability  measures  presented  in  this  paper  can  be 
utilized  for  comparing  the  stability  of  alternate  versions  of  a 
module  or  a  program.  The  logical  stability  measures  can  also 
be  normalized  to  provide  an  indication  of  the  amount  of 
effort  which  will  be  needed  during  the  maintenance  phase 
to  accommodate  for  inconsistency  created  by  logical  ripple 
effect  as  a  consequence  of  a  modification.  Based  upon  these 
figures,  decisions  can  be  made  regarding  the  logical  stability 
of  a  program  and  the  modules  of  which  the  program  is  com¬ 
posed.  This  information  can  also  help  maintenance  personnel 
select  a  particular  maintenance  proposal  among  alternatives. 
For  example,  if  it  is  determined  that  a  particular  maintenance 
proposal  affects  modules  which  have  poor  stability,  then 
alternative  modifications  which  do  not  affect  these  modules 
should  be  considered.  Modules  whose  logical  stability  is  too 
low  may  also  be  selected  for  restructuring  in  order  to  im¬ 
prove  their  logical  stability  . 

The  logical  stability  measures  can  be  normalized  by  first 
modifying  the  computation  of  the  module  logical  ripple 
effect  measure  to  include  the  complexity  of  the  module 
undergoing  maintenance.  Let  LRE*  denote  this  new  logical 
ripple  effect  measure  for  module  k  which  is  calculated  as 
follows: 

lre;  =  c*  +  v  imo-LCM*,] 

*'•  'a 

where  Ck  is  the  complexity  of  module  A.  I  lus  enables  LRF.jJ 
to  become  an  expected  value  for  the  complexity  of  a  primi¬ 
tive  modification  to  module  A.  Let  Cp  be  the  total  complexity 
of  the  program  which  is  equal  to  the  sum  of  all  the  module 
complexities  in  the  program.  Note  that  LREa*  <  Cp  since  the 
ripple  effect  is  bounded  by  the  number  of  modules  in  the  pro¬ 
gram.  The  normalized  logical  ripple  effect  measure  for  mod¬ 
ule  A.  denoted  as  LR1  * .  can  then  be  calculated  as  follows 

lrf;  =  LRl  a* 

The  normalized  logical  stability  measure  for  module  A.  de¬ 
mited  as  [.SjJ.caii  then  be  calculated  as  follows 

I  S*  1  I  R1-*. 

The  normalized  logical  stability  measuie  has  a  lattge  ol  0  to 
1  with  I  the  optimal  logical  stability  Hus  normalized  logical 
stability  can  be  utilized  qualitatively  oi  it  can  be  conelated 
with  collected  data  to  provide  a  quantitative  measuie  of 
stability 

The  normalized  logical  stability  measure  lor  the  progiaui, 
denoted  as  LSI’*,  can  be  computed  by  bust  calculating  the 
normalized  logical  ripple  ellect  measuie  lor  the  program, 
denoted  as  1  KIT’*,  as  follows 

I  RIP*  V  | /’(A  I  1  Rl  *| 


"I  he  nomuii/ed  logical  ineuMiic  lor  ihc  pitn*ram  can 

then  he  lukuLiteil  as  lollops 

L  SP*  -  I  l  KI  P* 

LSP*  has  the  same  i.mge  ami  mteipielalion  as  I  S* 

VI  lx  win  I 

In  tills  section  the  logical  stability  measures  tor  the  pro¬ 
gram  in  He.  2  will  he  calc nla t ed  according  to  the  previously 
described  algorithm  as  Pillows 

I  KImai\  =  I.  I -Rl  kkoois  ■  2  lL  LRI  | koo i s  ~  2.7 

The  logical  stability  of  each  of  the  modules  is  given  by 

LS\i.\|\  =0.25.  1-Skk(jois  =0.34,  LSmoms  =  0.37. 

The  potential  logical  ripple  effect  of  the  program  is 

LRIP  =  3  2 

and  hence  the  logical  stability  of  the  program  is  given  by 
LSP  =  0.3  I 

The  normalized  logical  stability  measures  for  each  of  the 
modules  and  the  program  are  given  as  follows: 


LSmain 

=  0 

F-Srkoois 

II 

O 

o 

I  J 

L'Sfnoois 

=  0.06 

LSP* 

=  0.0267 

These  measures  indicate  that  the  stability  of  the  program  in 
i  lg.  2  is  extremely  poor.  An  examination  of  the  program  pro¬ 
vides  intuitive  stippoit  of  these  measures  since  the  progiam 
utilizes  common  variables  in  every  module  as  well  as  shared 
information  in  the  form  of  passed  parameters.  Thus,  the 
change  propagation  potential  is  very  high  in  the  program. 

VII.  Vaiidaiion  ni  Si  Atm  i  iv  Miasikis 

As  previously  mentioned,  an  important  requirement  of  the 
stability  measures  necessary  to  increase  their  applicability 
and  acceptance  is  the  capability  ot  validating  them.  The 
previous  stability  measures  [3|.  |4|.  (ti|  tailed  to  satisfy  this 
requirement  due  to  calculations  involving  subjective  oi  dif¬ 
ficult  to  obtain  inputs  about  the  program  being  measured. 
The  stahilitv  measures  presented  in  this  paper  do  not  suffer 
from  these  limitations  since  they  are  produced  from  algo¬ 
rithms  which  calculate  intermodule  and  intramodule  change 
propagation  properties  of  the  program  being  measured .  Thus, 
these  measures  easily  lend  themselves  to  validation  studies. 

The  stahilitv  measures  presented  m  this  paper  can  be  vali¬ 
dated  either  directly  through  experimentation  or  indirectly 
through  a  discussion  ot  how  they  are  influenced  by  various 
established  attributes  ot  a  program  which  affect  its  stability 
during  maintenance  I  he  direct  approach  to  validation  re 
quires  a  large  database  ot  maintenance  information  lor  a  sig 
mtkant  number  ol  various  types  of  programs  m  different 
I, menaces  whkh  have  undcigone  a  significant  nunihei  ol 
modifuutions  ol  a  wide  varietv.  One  expeiimenlal  appioavli 
would  be  to  examine  sets  ol  programs  developed  to  identical 
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specifications  but  diffeiing  in  design  or  coding.  Logical  sta¬ 
bility  measures  for  each  version  of  the  program  could  then 
be  calculated  to  determine  which  possesses  the  best  stability 
A  set  of  identical  modifications  to  the  specifications  of  each 
program  could  then  be  performed.  Lot  each  muddle  arum  to 
each  program,  a  logical  complexity  of  modification.  1  f Ah 
could  then  be  calculated  based  upon  the  difficulty  ol  im¬ 
plementing  the  patticulai  modification  for  the  progiam  One 
particular  method  for  calculating  an  L.CM  has  previously  been 
described  |7|.  |x|.  A  tier  a  significant  numbei  ol  identical  spec¬ 
ification  modifications  have  been  implemented  on  all  veisions 
of  the  program,  an  average  logical  complexity  ol  modification. 
A  L.CM.  could  b  e  computed  for  each  vetsion  of  the  ptogram. 
This  ALTAI  reflects  the  stability  ot  the  program  and.  thus, 
the  ALTAI  can  he  utilized  as  a  variable  m  the  experiment. 
After  a  significant  numbei  ot  sets  of  programs  have  under¬ 
gone  their  sets  of  modifications,  experimental  conclusions 
based  upon  a  statistical  analvsis  of  the  AIT'M  figures  and 
the  stability  measures  could  he  formulated. 

This  direct  approach  to  validation  ot  the  stability  measutes 
will  be  difficult  vine  to  the  numbei  ot  ptogiams  and  modifica¬ 
tions  necessary  to  produce  significant  statistical  results  Thus, 
this  direct  approach  to  validation  will  be  performed  utilizing 
the  maintenance  data  base  which  will  be  created  lit  conjunc¬ 
tion  with  the  validation  of  om  ptogram  maintainability  me  a 
sure  which  is  currcntlv  under  investigation. 

The  stability  mcasmes  picsctitcd  liete  can  also  be  indued!- 
validated  by  showing  how  the  mcasmes  aic  attested  by  some 
attiihutes  ol  the  progiam  which  ailed  its  stability  dtirin: 
maintenance  One  progiam  ailnbule  which  allects  maintain 
abilitv  is  the  use  ot  elohal  variables  I  he  channeling  ol  coir 
mnmcatioii  via  p.uametei -passing  i.ilhei  Ilian  cloh.d  vail. -hi.' 
is  v Tiai.k  leiisiic  ol  mole  Miami,  unable  pio.’iams  |llj  II  n 


an  indirect  validation  of  the  stability  measures  must  show 
that  the  stability  of  programs  utilizing  parametei  passing  is 
generally  better  than  that  of  programs  utilizing;  global  vari¬ 
ables.  This  can  be  easily  shown  since  the  calculation  of  LS, 
is  based  upon  the  LCM  of  each  interlace  variable  in  module 
/  Since  global  variables  are  regarded  as  interface  variables 
and  since  the  LCM  of  an  interface  variable  is  equal  to  the  sum 
of  the  complexity  of  the  modules  affected  by  modification 
ol  the  interface  variable.  LS,-  will  he  small  for  modules  sharing 
the  global  variable.  Thus,  the  logical  stability  of  the  program 
will  also  be  small.  On  the  other  hand,  if  communication  is 
via  parameter  passing  instead  of  global  variables,  the  LCM  of 
the  parameters  will  generally  be  small,  and  hence  LS,  and  I  SP 
will  generally  be  improved.  Thus,  the  stability  measures  in¬ 
dicate  that  tlie  stability  of  programs  utilizing  parameter  pass¬ 
ing  is  generally  better  than  that  of  programs  utilizing  global 
variables. 

The  stability  of  a  program  during  maintenance  is  also  af¬ 
fected  by  the  utilization  of  data  abstractions.  Data  abstrac¬ 
tions  hide  information  about  data  which  may  undergo  mod¬ 
ification  from  the  program  modules  which  manipulate  it. 
Thus,  data  abstraction  utilization  is  characteristic  of  more 
maintainable  programs.  An  indirect  validation  of  the  stability 
measures  must,  therefore,  show  that  the  stability  of  programs 
utilizing  data  abstractions  is  generally  better  than  that  ol  pro¬ 
grams  whose  modules  directly  manipulate  data  structures. 
This  can  easily  he  shown  by  examining  the  stability  measures 
of  a  program  that  utilizes  data  abstractions  arid  comparing 
those  measures  to  that  of  an  equivalent  program  m  which  the 
modules  directly  access  the  data  structure,  re  ,  data  abstrac¬ 
tions  aie  not  utilized  The  modules  which  utilize  a  data  ab¬ 
straction  to  access  a  data  si  me  Hue  w  ill  have  fewer  assumptions 
about  their  interface  variables  and  hence  have  higher  stability 
than  that  of  the  modules  directly  accessing  the  data  structure 
and  hence  having  many  assumptions  about  n  1  01  example, 
cotisidet  a  data  structure  consisting  of  records  wheie  each 
ivcord  has  an  employee  number  and  a  department  mmihei. 
Assume  that  module  IN  IT  initializes  the  data  structure  and 
orders  the  recoids  by  the  employee  number.  Also,  assume 
modules  V  )  .  and  /  must  access  the  data  structure  to  obtain 
the  department  lor  a  mum  employee  nunibei.  In  this  design, 
it  module  IM1  is  modified!  so  that  the  records  in  the  data 
sink  line  a'c  ordered  In  the  department  instead  ot  the  em¬ 
ployee  minihei.  then  modules  V  )  .  and  /  must  also  he  mod- 
I’i-.'d  tins  poienti.il  modification  is  reflected  in  the  calcula¬ 
tion  ol  1  S|M  |  and.  consequently  .  I  SI’  It .  howevei .  modules 
1  )  and  /  i  i ■  c  di-'  d.iia  so-i.-tii:..  through  a  data  ahslrae- 
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affected  by  a  change  to  a  module,  i.e  .  its  scope  of  effect,  is 
a  subset  of  the  modules  which  are  directly  or  indirectly  In¬ 
voked  by  the  modified  module,  i.e..  its  scope  ot  control  (12|. 
An  indirect  validation  of  the  stability  measures  must,  there¬ 
fore  .  show  that  the  stability  of  programs  possessing  this  type 
of  control  and  data  structure  are  better  than  that  of  programs 
which  do  not  possess  this  attribute  Now  a  program  which 
exhibits  this  scope  of  effect/seope  of  control  property  has  a 
logical  stability  which  is  calculated  from  the  logical  stability 
of  its  modules,  each  of  which  Is  bounded  above  by  the  sum  of 
the  complexity  of  the  modules  which  lie  within  its  scope  of 
control.  If  the  scope  of  effect  of  a  modification  to  a  module 
does  not  lie  within  the  scope  of  control  of  the  module,  the 
logical  stability  of  the  module  is  only  bounded  above  by  the 
complexity  of  the  entire  program.  Thus,  the  stability  mea¬ 
sures  indicate  that  the  stability  of  programs  possessing  the 
scope  of  effect/scope  of  control  attribute  are  generally  better 
titan  that  of  programs  which  do  not  possess  this  attribute. 

Another  attribute  affecting  piogram  stability  during  main¬ 
tenance  is  the  complexity  of  the  program.  Program  complex¬ 
ity  directly  affects  the  understandability  of  the  program  and. 
consequently ,  its  maintainability.  Thus,  an  indirect  validation 
of  the  stability  measures  must,  therefore,  show  that  the  sta¬ 
bility  of  programs  with  less  complexity  is  generally  better 
than  that  of  programs  with  more  complexity.  This  is  readily 
apparent  from  the  calculation  of  the  logical  complexity  ol 
modification  of  an  interface  variable.  Thus,  complexity  is 
clearly  reflected  in  the  calculation  of  the  stability  measures 

The  stability  measures  presented  here  can.  thus,  be  indirectly 
validated  since  they  incorporate  and  relied  some  aspects  of 
program  design  generally  recognized  as  contributing  to  the 
development  ofptogtam  stability  during  maintenance. 

VMI  Cum  i  i  sio\  -\ n i >  Ti  it  hi  K i  si  \i<i  il 

In  this  papei.  measures  lot  estimating  the  logical  stability 
ot  a  piogram  and  the  modules  ol  which  the  progiam  is  com¬ 
posed  have  been  piesented  Algorithms  lot  computing  these 
stability  measures  and  for  normalizing  them  have  also  been 
given.  .Applications  and  inteipietall"iis  o|  these  stability  mea¬ 
sures  as  well  as  an  indirect  validation  <>l  the  measures  have 
been  presented 

Much  lescatvh  temaitts  to  be  dune  ill  tins  area  Due  area  ot 
future  research  involves  the  application  oi  the  logical  stability 
measures  to  the  design  phase  ol  the  sotlw.ne  hie  cycle  \n 
analysis  ol  1 1  to  control  llow  and  the  data  flow  ol  the  design 
of  the  proeram  should  provide  su  I  lie  lent  intorni.ilion  loi 
calculation  ol  a  logk.il  stability  mc.isuie  .lining  the  design 
phase 

Another  atea  ol  future  teseai.h  invoices  the  decc'lopnienl  ol 
a  performance  stability  measure  Since  a  piogram  modifier! 
lion  niav  result  in  both  a  logk.il  and  a  performance  ripple  el- 
fed.  a  measure  tor  the  peril 'iin.ince  slabihiv  o|  ,i  progiam  am 
the  nii  lilies  ol  which  the  rung'. am  is  composed  is  also  nccC' 

'ary  PI-  M 

Much  ivsoauh  .iKm  ivmams  !« *  h.-  vloiu  in  tho  hlcntifk  dtnm 
t » }  ifu*  oil  ioi  n"M  u  ,iu*  <  |  u.xl  1 1  \  i.k  I  -  ‘is  dull  rihui  mo  to  mam  tarn 
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I  his  maintainability  measure  must  he  ealeulatahle 
at  each  phase  of  the  software  life  cycle  and  must  he  validated 
Another  area  of  future  research  involves  the  development  of 
automated  restructuring  techniques  to  improve  botli  the  stahil- 
il\  oi  a  program  and  the  modules  of  which  the  program  is  com¬ 
posed  These  restructuring  techniques  should  be  applicable  at 
each  phase  of  the  software  development.  Restructuring  tech¬ 
niques  must  also  he  developed  to  improve  the  other  quality 
factors  contributing  to  maintainability.  These  restructuring 
techniques  must  automatically  improve  the  maintainability  of 
the  program  at  each  phase  of  its  development.  The  net  results 
of  this  approach  should  he  a  significant  reduction  of  the  main¬ 
tenance  costs  of  software  programs  and.  consequently  ,  a  sub¬ 
stantia!  reduction  in  their  life  cycle  costs.  Program  reliability 
should  also  be  improved  because  fewer  errors  may  he  injected 
into  the  program  during  program  changes  due  to  its  improved 
maintainability . 
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One  measure  of  program  performance 
is  the  execution  time  of  the  program.  In 
this  paper  a  technique  based  on  a 
self-metric  approach  for  estimating  the 
execution  time  ot  program  paths  is 
presented.  Estimates  are  obtained  for 
each  of  the  operations  of  a  programming 
language.  A  program  is  then  used  to 
analyze  the  program  to  be  measured, 
inserting  additional  program  instructions 
to  obtain  statistics  regarding  the 
execution  time.  This  work  was  being  done 
to  assist  in  the  analysis  of  performance 
ripple  effect  during  program  modifica¬ 
tion.  In  this  application,  information 
may  be  needed  about  each  execution  of 
specific  paths  with  critical  timing 
constraints.  The  particular  paths  to  be 
measured,  and  the  type  of  statistics  to 
be  provided  are  determined  by  the  user. 

This  technique  has  been  implemented 
and  used  for  experiments  with  PASCAL 
programs  running  on  a  DEC  VAX  11/780 
computer . 

Index  Terms  --  Dynamic  monitoring, 
execution  time,  hardware  clocks,  PASCAL 
programming  language,  performance  ripple 
effect,  program  performance. 

INTRODUCTION 

Program  performance  is  a  measure  of 
how  efficiently  a  sequence  of  statements 
of  a  computer  program  is  executed  in  a 
given  environment.  Ideally,  one  should 
be  able  to  determine  an  absolute  figure 
which  would  be  a  measure  of  the 
performance  of  a  program  and  remain 
invariant,  regardless  of  the  environmen¬ 
tal  conditions.  However,  as  will  be 
shown  later,  such  a  measure  is  extremely 
difficult  to  define,  and  hence  the 
restriction  to  "performance  in  a  given 
environment"  is  made. 

A  number  of  program  performance 
indices  have  been  proposed,  and  the  one 
that  is  most  widely  accepted  is  that  of 

*  This  work  was  supported  by  the  Rome  Air 

Development  Center,  U.S.  Air  Force 
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execution  time  [1J.  It  is  assumed,  for 
the  time  being,  that  an  intuitive 
definiton  of  execution  time  is  "the 
amount  of  time  required  to  execute  a 
portion  of  code".  A  more  detailed 
discussion  and  definition  of  execution 
time  will  be  given  later. 

In  this  paper,  we  will  present  a 
technique  to  provide  an  estimate  of  the 
execution  time  of  any  set  of  statements 
in  a  program.  The  reason  behind  the 
effort  to  provide  such  information  is 
threefold  : 

1.  to  provide  the  user  with 
assistance  in  the  decision  making  process 
about  program  efficiency  when  selecting 
different  algorithms, 

2.  to  provide  a  tool  for  the 
development  of  faster  and  more  reliable 
programs.  By  having  a  frequency  count  of 
different  modules  of  a  program,  the  user 
will  be  able  to  recognize  areas  of  code 
which  are  never  executed  (indicating 
redundancy  or  possible  error),  and  also 
those  areas  on  which  to  concentrate 
optimizing  efforts  (the  performance 
bottlenecks  and  heavily  used  procedures). 
It  has  been  reported  that,  for  a  typical 
program,  approximately  3%  of  the  code 
accounts  for  50%  of  the  execution  time 
12]  , 

3.  to  provide  the  software 
maintenance  personnel  with  an  easy-to-use 
tool  to  detect  and  measure  the 
performance  ripple  effect  [3,4],  Perfor¬ 
mance  ripple  effect  has  been  defined  as 
the  change  in  the  performance  of  modules 
as  a  consequence  of  software  modifica¬ 
tions,  and  is  due  to  the  existence  of  a 
performance  dependency  relationship 
between  two  modules,  say  A  and  B;  that 
is,  a  change  in  module  A  can  have  an 
effect  on  the  performance  of  module  B 
[3,4).  Consider  two  modules,  A  and  B, 
from  a  given  program,  as  shown  in  Figure 
1.  Module  B  can  be  affected  by  a  change 
in  module  A  if  there  is  any  kind  of 
linkage  between  A  and  B,  such  as  a 
control  and/or  data  flow  link.  A  change 
in  statement  S  in  A  may  affect  the 
execution  time  -  a  performance  index  -  of 
B.  In  order  to  check  the  analysis  of 
performance  ripple  effects  and  to  test 
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the  correctness  of  a  modification,  it  is 
necessary  to  test  the  performance  of  all 
paths  in  the  program  which  have 
performance  requirements. 

The  technique  we  are  going  to 
present  is  based  on  self-metric  analysis 
of  program  performance.  We  describe  a 
set  of  pseudo-statements  to  be  inserted 
by  the  programmer  into  the  program  to  be 
analyzed.  This  is  the  method  by  which 
the  critical  paths  of  the  program  are 
defined.  In  order  to  estimate  the 
execution  time  of  a  path,  we  must  know 
the  time  required  for  each  operation  in 
the  high-level  language.  Experiments  are 
performed  to  determine  the  average 
execution  time  of  each  operation  in  the 
high-level  language.  We  refer  to  these 
values  as  the  costs  of  the  operations. 
Using  a  table  of  costs  for  the  operations 
of  the  language  and  the  pseudo-statements 
inserted  by  the  programmer,  a  source 
language  program  analyzer  modifies  the 
source  program  to  include  additional 
statements  to  update  cost  counters 
associated  with  the  paths  being  timed. 
To  demonstrate  the  technique,  an  analyzer 
has  been  implemented  for  PASCAL  programs, 
using  a  cost  table  for  a  DEC  VAX  11/780 
computer.  We  will  compare  the  results 
obtained  for  two  programs  under  various 
conditions,  using  both  the  system  clock 
and  the  analyzer. 

DEFIN  IT  I_0N  OF  EXECUTION  TIME 

A  desirable  characteristic  of  any 
measure  is  that  it  must  be  repeatable, 
otherwise  it  would  be  of  little  use,  if 
any.  This  condition  restricts  the 
definition  of  execution  time  since  the 
execution  time  of  a  program  may  mean  one 
of  several  different  things,  depending  on 
tiie  point  of  view  from  which  it  is  being 
cons ide red. 

In  a  uniprogramming  environment, 
measuring  CPU  time  is  an  easy  task  that 
requires  only  access  to  a  real-time  clock 
which  can  keep  time  for  any  desired  time 
unit.  CPU  time  charged  to  a  process,  or 
time  units  used  for  the  execution  of  the 
instructions  between  two  points  in  a 
program  are  due  only  to  the  execution  of 
that  particular  segment. 

A  user  in  a  multiprogramming 
environment  may  associate  execution  time 
with  turnaround  time;  that  is,  the  span 
d  time  from  the  moment  the  execution 
command  is  issued  until  the  moment  the 
task  is  completed:  alternatively,  the 
execution  time  of  the  program  may  be 
viewed  as  a  measure  of  the  program's 
consumption  ot  (virtual'  CPU  time  -  real 
time  m.  us  interrupts  -  or  it  may  signify 
CPU  time  plus  the  execution  of  all 
necessary  dy. •.tom  routines. 

TSi.i:.  it  is  clear  that  there  are 
‘.’■.•oral  soirees  of  variation  associated 


with  CPU  time  consumed,  roughly 
classifiable  into  two  broad  categories  : 
variations  in  hardware  speed  and  effects 
of  system  software.  The  former  includes 
mixed  memory  speeds  for  the  different 
levels  of  memory  hierarchy,  cache 
performance,  and  the  size  of  the 
allocated  working  set.  Software  factors 
include  the  cost  of  processing  interrupts 
("quick"  interrupt  service  routines  are 
sometimes  charged  to  whichever  process 
was  interrupted,  because  it  would  not  be 
worth  the  effort  to  charge  it  to  the 
appropriate  process,  context  switching  or 
supervisor/monitor  services)  and  the  cost 
of  scheduling  and  statistical  work. 

Obviously,  any  arrangements  to 
reduce  these  effects  would  restrict  the 
utilization  of  the  system  by  other  users, 
and  make  any  measuring  session  cumbersome 
and  exceedingly  complex. 

It  is  true  that,  today,  most 
operating  systems  do  keep  track  of  the 
CPU  time  used  by  executing  user 
processes.  But,  besides  the  fact  that 
such  information  is  plagued  by  the 
variations  already  discussed,  they  still 
retain  the  most  problematic  aspect  of 
timing  procedures  and  measuring  CPU 
usage,  namely  clock  resolution  [5,6). 
The  clock  resolution  should  be  small  when 
compared  with  the  time  spent  in  the 
procedure.  But  unfortunately,  this  is 
not  true  of  most  systems. 

The  IBM/370  hardware  includes  a 
time-of-day  clock  (real  time  clock)  with 
a  resolution  of  1  microsecond,  which  runs 
continuously  and  provides  timing 
information  for  operating  system 
scheduling  and  accounting  purposes.  The 
clock  is  easily  accessible  with  one 
low-level  instruction  (move  register 
type) ,  and  has  been  successfully  used  to 
time  procedures  [7). 

The  DEC  VAX- 11/780  architecture,  on 
the  other  hand,  presents  a  very 
restrictive  clock  system,  as  shown  in 
Figure  2.  The  CPU  time  information 
stored  in  the  process  header  can  be  read 
by  means  of  a  system  routine  available  in 
the  VAX/VMS  Operating  System  :  the  "Get 
Job/Process  Information"  system  service 
provides  accounting,  status  and 
identification  information  about  a 
specified  process  [8).  The  accumulated 
CPU  time  may  be  read  in  10  millisecond 
"tics"  . 

Therefore,  when  trying  to  measure 
the  actual  execution  time  of  procedures 
by  means  of  the  virtual  CPU  time,  one 
runs  into  two  levels  of  difficulty:  the 
first  level  is  the  problem  of  accessing 
the  operating  system  clock  registers,  and 
their  inadequate  (too  coarse)  accuracy; 
the  second  level  is  the  variation 
associated  with  the  virtual  CPU  time  due 
to  memory  management,  interrupts, 
operating  system  service  routines  and 


shortcuts  in  the  accounting  policy,  and 
the  overhead  introduced  by  the  use  of 
software  probes  that  call  system 
rout  rnes . 

In  view  of  all  these  considerations, 
we  shall  define  execution  time  as  the 
amount  of  CPU  time  used  by  a  program  when 
a  sequence  of  statements  is  executed, 
regardless  of  the  environmental 
conditions.  The  execution  time  of  a 
sequence  of  statements,  then,  is  the  sum 
of  the  execution  times  of  the  statements 
of  the  sequence,  each  of  which  is,  in 
turn,  the  sum  of  the  execution  times  of 
trie  operations  performed  within  those 
s  ta  tements . 

METHODS  OF  MONITORING  PROGRAM  EXECUTION 

There  are  a  number  of  different 
approaches  used  to  monitor  the  behavior 
of  a  program.  Lyon  and  Stillman  [9], 
list  four  typical  monitoring  ph 1 losoph ies 
and  compare  them  on  the  basis  of  a  number 
of  characteristics  of  cost  and 
convenience,  such  as  portability, 
accuracy,  cost  to  prepare  a  program,  and 
clock  requirements. 

Each  of  the  four  methods  has  its 
advantages  and  disadvantages.  The  first 
method,  using  clock  interrupts  via  an 
operating  system,  is  excellent  for  use 
across  different  compilers,  but  requires 
a  fast  clock  for  good  accuracy  and 
precision.  The  second  method, 
event-driven  hardware  probes,  are  good  in 
every  respect,  but  are  costly  to  set  up. 
The  third  method,  inserting  calls  to  a 
system  clock  limits  use  to  one  language, 
but  is  an  excellent  approach  if  there  is 
a  consistent  clock,  and  if  the  operating 
system  keeps  track  of  program-state  and 
supervisor-state  times  separately  and  for 
each  user.  The  fourth  method,  placing 
counters  inside  a  segmented  program, 
although  limited  to  one  language  and 
requiring  knowledge  of  the  approximate 
cost  for  each  statement  type,  does  have  a 
great  advantage:  during  execution  of  the 
program,  performance  monitoring  does  not 
use  any  clock. 

This  latter  method  is  also  known  a:, 
the  " se 1 f -me t r i c "  approach,  due  to  the 
tact  that  an  instrumented  version  of  the 
program  gathers  all  the  information  about 
itself,  in  addition  to  performing  its 
normal  function.  This  is  a  method  which 
1 1 u : :  had  a  number  of  reported  uses,  for 
monitoring  both  the  execution  time  and 
the  logical  behavior  of  the  program 
[z,9-lh| . 

Tin*  self-metric  approach  typically 
consists  of  two  phases.:  the  original 
source  code  is  f lrst  accepted  as  input  by 
a  source  code  analyze: ,  which  produces  as 
output  the  instrumented  version  of  the 
program,  containing  t he  necessary  code 
for  t  fie  tallying  function.  Phase  two 


occurs  when  the  augmented  version  of  the 
irogram  is  actually  executed  on  the 
iser's  original  input  data,  producing  a 
report  on  the  execution  statistics  in 
addition  to  its  normal  oi  The 

entire  process  is  represent. d  in  Figure 

i. 

THE  PSEUDO-STATEMENTS  FOR 

INSTRUMENTATION  PURPO SES 

Four  pseudo-statements  and  one 
pseudo-declaration  are  defined  to  allow 
the  user  to  instrument  the  source  code, 
issuing  directives  to  the  analyzer  to 
take  actions  such  as  to  set  up  a  new 
counter,  to  turn  a  counter  on  or  oil 
(thus  defining  the  segment  of  code  that 
is  to  be  monitored),  to  reset  the  value 
of  the  counter1',  and  to  prepare  the  data 
file  for  output  and  record  the  results. 

The  VAR  Pseudo-declaration 

This  pseudo-declaration  is  inserted 
by  the  user  in  the  global  variable 
declaration  part  of  the  source  program. 
It  instructs  the  analyzer  to  generate  the 
code  necessary  to  declare  the  global 
variables  which  will  keep  track  of  the 
statistical  measures  gathered  during  the 
execution  of  the  instrumented  version. 
The  names  of  the  probes  are  declared  in 
ttie  program  by  the  user.  The  syntax  of 
this  pseudo-declaration  is: 

SS  VAR  l C- i  , 1  *  C-n  , 
where  C-l  ...  C-n  are  the  names  of  the 
probes.  These  names  must  satisfy  the 
syntax  for  an  identifier  of  the 
programming  language  in  use. 

The  INIT  Pseudo-statement 


The  INIT  pseudo-statement  initia¬ 
lizes  all  variables  used  for  tallying 
purposes,  and  opens  and  prepares  tor 
output  the  data  file  into  which  the  mea¬ 
surements  will  be  written.  This  pseu¬ 
do-statement  should  be  used  as  the  first 
statement  of  the  main  program.  The 
syntax  of  this  pseudo-statement  is: 

$$  INIT 

The  ON  Pseudo-stateme nt. 

Ttiis  pseudo-statement  opens  the 
scope  of  a  new  probe  and  defines  ttie 
starting  point  of  a  new  segment  <M  oodi 
which  is  to  be  time-monitored.  Tin 
syntax  of  this  pseudo-statement  is: 

$$  ON  PROBENAME 

where  PROBENAME  is  one  of  the  probe  name, 
which  were  declared  in  the  VA1 
pseudo-declaration,  which  lias  not  yet 
been  used . 

The  OFF  Pseudo-statement 


The  OFF  pseudo-statement  closes  the 
scope  of  the  current  probe,  reestablishes 
the  scope  of  the  previous  probe  (there  is 
one  predefined  probe),  and  generates 
source  code,  which  will  prepare  collected 
data  and  do  simplification  and  output  of 
intermediate  results.  The  user  may 
specify  whether  or  not  he/she  wants  every 
new  measurement  of  a  probe  to  be 
separately  recorded;  in  either  case,  the 
average  value  measured  by  the  probe  will 
:>e  computed.  The  syntax  of  this  pseu¬ 
do-statement  is: 

$$  OFF  AVERAGE 
or  $$  OFF  NONAVERAGE 

The  AVERAGE  option  determines  that 
only  the  average  and  standard  deviation 
are  kept,  whereas  NONAVERAGE  specifies 
that  each  new  value  of  the  probe  is  also 
to  be  recorded  and  output. 

The  OFF  pseudo-statement  ends  the 
scope  of  the  last  defined  new  probe, 
tnerefore  avoiding  possible  ambiguities 
due  to  the  overlapping  of  probes. 
Nesting  of  probes  is  allowed,  however. 

The  RESULT  Pseudo-statement 

This  pseudo-statement  generates 
instructions  to  output  the  gathered  data 
onto  a  data  file.  It  should  be  used 
after  the  last  executable  statement  of 
the  main  program,  although  this  is  not 
required.  The  syntax  of  this  pseu¬ 
do-statement  is: 

$$  RESULT 

How  to  Use  the  Pseudo-statements 

The  following  steps  describe  the  use 
of  the  technique: 

1.  identify  the  number  of  sections  of 
code  to  be  monitored  and  select  an 
equal  number  of  probe  names; 

2.  in  the  variable  declaration  part  of 
the  main  program,  insert  the  VAR 
pseudo-declaration  listing  all  of 
the  selected  probe  names; 

1.  for  each  section  of  code  to  be 
monitored,  insert  the  ON  pseu¬ 
do-statement  at  its  beginning,  and 
the  OFF  pseudo-statement  at  its 
end,  specifying  the  AVERAGE/NONAVE- 
RAGE  option; 

4.  insert  the  INIT  pseudo-statement 
before  the  first  executable  state¬ 
ment  of  the  main  program; 

5.  insert  the  RESULT  pseudo-statement 
after  the  last  executable  statement 
of  the  main  program; 

6.  execute  the  analyzer  using  the 
segmented  version  o!  the  program  as 
the  input  file  and  assign  a  new 
output  file  to  hold  the 
instrumented  vers i on ; 

~ .  comp l I e ,  link  and  execute  the 
instrument  etl  version; 

h .  tn-'  results  >f  measuring 


execution  will  be  stored  in  a 
standard  file. 

Special  Notes 

The  user  should  be  aware  of  tne 
general  rule  that  all  paths  which  begin 
at  an  ON  pseudo-statement  must  pass 
through  the  cor r espondi ng  OFF  pseu¬ 
do-statement.  This  requirement  is  one  of 
the  difficulties  associated  with 
analyzing  programs  which  contain  jumps, 
and  hence  the  "structured"  languages 
provide  more  assistance  in  checking  that 
this  rule  is  followed.  The  following 
examples  illustrate  the  use  of  the  ON  and 
OFF  pseudo-statements  with  three 
different  PASCAL  language  constructs. 

Example  1: 

REPEAT 

$$  ON  PROBE 1; 

UNTIL  ...  ; 

$$  OFF  AVERAGE 
is  incorrect,  whereas 
REPEAT 

?$  ON  PROBEl ; 

$$  OFF  AVERAGE 
UNTIL  ...  ; 

is  correct. 

Example  2: 

IF  . . .  THEN 

BEGIN  ON  PROBEl;  SI 
END 
ELSE 

BEGIN  S2 ;  $$  OFF  AVERAGE 

END  ; 

>s  incorrect,  whereas 
IF  . . .  THEN 

BEGIN  5T"0N  PROBEl;  SI; 

$J~OFF  AVERAGE 
END 
ELSE 

BEGIN  $$  ON  PROBE 2 ;  S 2 ; 

$$  OFF  AVERAGE 
END  ; 

is  correct. 

Example  3: 

$$  ON  PROBEl  ; 

I^F  .  .  .  THEN  GOTO  1  ; 

$$  OFF  AVERAGE  ; 

1  : 

is  incorrect  ,  whereas 

$$  ON  PROBEl  ; 

IF  ...  THEN  GOTO  1  ; 

1  : 


t  he 


m 


Y.  -  ■ 


rt 


$$  OFF  AVERAGE 
is  correct. 


THE  EXPERIMENT  TO  DETERMINE 


THE  EXECUTION  COSTS 


in  units  of 
operations , 
of  the 


In  this  section,  we  will  discuss  how 
experiments  may  be  conducted  to  determine 
approximate  relative  costs  - 
time  -  for  the  standard 
functions  and  procedures 
programming  language  a  use. 

Wortman  [7]  has  conducted  a  number 
of  experiments  to  compare  “system  time" 
and  "hardware  time”;  system  time  was 
defined  as  the  timing  information 
returned  by  a  logical  clock  (26.04 
microsecond  resolution)  maintained  for  a 
task  by  IBM's  OS/360  MVT  operating 
hardware  time  was  the  reading 
from  the  hardware  clock  (1 
resolution) .  The  "system  time" 


system ; 
obtained 


m lc  rosec 


described  is  very  similar  to  the  virtual 


CPU  time  maintained  for  each  process  by 
the  VAX/VMS  operating  system. 

Wortman  concluded  that  the  system 
time  had  a  "normalized  standard  deviation 
that  was,  on  the  average,  two  orders  of 
magnitude  larger  than  the  normalized 
standard  deviation  observed  for  hardware 
time"  [7].  He  alto  found  that  the  mean 
value  for  both  measurements  differed  by 
less  than  0.6%. 

This  method  of  measurement  was 
adapted  for  the  DEC  VAX-11/780  computer, 
using  a  system  service  routine  (SGETJPI) 
which  gives  access  to  the  virtual  CPU 
time  [8).  For  our  implementation,  this 
routine  was  used  to  estimate  the  cost  of 
the  various  operations  in  standard 
PASCAL.  Figure  4  is  a  schematic  version 
of  trie  algorithm  used. 


THE  ANALYZER 


The  analyzer  developed  uses  the 
self-metric  approach  described  above, 
inserting  tallying  code  which  produces 
estimated  execution  times  for  the  total 
run  and  tor  each  segment  of  code  that  the 
user  has  chosen  to  monitor. 

The  PASCAL  [16)  language  was  chosen 
because  of  its  growing  acceptance  in  many 
programming  situations,  its  elegance  and, 
most  of  all,  its  extensive  use  of 
structured  statements. 

Trie  analyzer  searches  the  PASCAL 
code  for  the  occurrence  of  reserved 
words,  all  standard  (as  well  as  a  small 
number  of  non-standard)  identifiers  and 
operators,  and  determines  where  to  insert 
code  to  account  for  the  execution  ol 
every  statement.  By  making  use  of  a 
"cost"  table,  tallying  statements  are 
generated  to  increment  "cost"  counters, 
and  the  accumulated  values  are  recorded 
at  t  tie  end  of  the  execution  r  in.  The 
Appendix  describes  tie  code  which  is 


inserted  for  each  PASCAL  statement  type. 

To  estimate  the  total  execution 
time,  which  is  defined  as  the  sum  of  the 
execution  times  of  each  individual 
statement,  it  is  first  necessary  to 
obtain  estimates  of  the  relative  cost  (in 
units  of  Lime)  for  the  individual 
statement  types.  The  algor ithm  used  to 
do  this,  described  in  the  previous 
section,  lacks  some  accuracy,  since  it 
does  not  take  into  account  the  code  opti¬ 
mization  capability  of  the  compiler. 
However,  the  relative  costs  derived, 
employing  the  most  general  sample 
statements  possible,  have  shown  them¬ 
selves  to  be  consistent,  reliable,  and 
satisfactory  for  their  intended  use  as 
detectors  of  changes  in  per foramance . 

Trying  to  incot porate  the  effect  of 
the  ciiapiler  optimization  would  introduce 
a  variable  which  is  too  volatile,  or 
perhaps  totally  uncontrollable;  there¬ 
fore,  even  at  the  cost  of  some  inaccura¬ 
cy,  a  decision  was  made  to  keep  all  of 
the  study  in  a  high  level  language 
env i ronment . 

Restr ictions  a nd  Extens io ns 

An  effort  was  made  to  make  the 
analyzer  accept  the  entire  PASCAL 
language  as  defined  in  [16],  but  the 
following  problem  was  encounte r eu : 

The  feature  which  causes  some 
difficulty  (not  part  of  the  original 
language  definition,  but  in  common  use) 
is  the  use  of  externally  declared 
procedures  or  functions.  This  is  used 
quite  frequently  in  large  software 
systems,  to  aid  modularity  and  to  reduce 
the  time  needed  for  modification  and 
compilation.  This  feature  is  also 
necessary  to  enable  the  use  of  installa¬ 
tion  defined  routines  existing  in  system 
libraries.  Since  such  routines  are  not 
available  to  the  analyzer,  and  need  not 
lie  written  in  PASCAL,  it  is  not  possible 
tor  us  to  apply  self-metric  analysis  to 
them.  Tner  el  ere ,  the  analyzer  lias  been 
implemented  to  recognize  such  procedures 
and  functions,  but  to  take  no  further 
act  ion . 

As  a  ies.il  t  ol  tins  decision,  it  is 
not  possible  i or  us  to  allow  procedures 
or  iunct  ions  ,i.  parameters,  since  such 
procedures  or  functions  may  be  declared 
either  within  in  program  (and  so  should 
be  analyzed)  ,  u  as  external,  routines 
(and  so  should  not  be  analyzed)  .  While 
this  restriction  has  not  limited  our  use 
of  the  analyzer,  it  may  be  relaxed  to 
toroid  instead  the  use  of  any  externally 
declared  routine  as  a  parameter. 

COMPARISON  DATA  FOR  TWO  ROUTINES 

The  meat  desirable  characteristic  of 


■a  ted  by  our  analyzer  and  tin-  cost.-: 
i  by  the  system  clock  show  the  same 
■  if  proportion,  with  a  percentage 
ivs  difference  of  about  40  per  cent. 
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■m  clock,  as  shown  below.  Also  as 
:  ted ,  t  he  sequential  search  procedure 
shown  to  he  highly  d< -pendent  on  the 
loular  value  of  the  search  key. 

We  compare  the  figures  obtained  for 


■qu  e  n  t.  l  a  1 


(on  the 


each  pair)  with  those  for  the 
search  procedure.  Although  the 
oiitained  from  the  clock  are 
to  the  variations  previously 
■  and  the  figures  obtained  from 
t  are  based  on  apprcx  i  mat  ions 

’  ,;r.e  required  to  execute 
LAS'  'Ah  oner  at .ions  .  the  costs 


Sequential  ./Binary 
Estimated  Time  Clock  Time 
0.4/1. 5=  0.27  6/29=  0.20 

14.4/1.6=  22.9  1217/29=  42.6 

17.1/1.6=  10.8  631/17=  17.0 


COMPLETE  EXAMPLE 


A  Samplr 


ar,fi  Fide: 


i  relatively  small  PASCAL  program 
it  i  rig  of  8  modules  was  ctiosen  to 
rate-  tne  use  of  the  execution  time 
tor.  This  program  appeared  in 

and  Elder  17]  . 

ne  segmented  version  of  the  program: 

is,  the  original  source  code  plus 
seudo-statementn  inserted  by  the 
i  shown  in  Figure  6.  Each  proce- 
ni  is  the  main  program,  is  being 
red.  On  entry  to  each  procedure  a 
is  turned  “ON";  and  on  exit  from 
procedure,  the  same  probe  is  turned 
Tne  eighth  probe  monitors  the 
■  .dale  of  the  program. 

:.<•  probe  associated  with  procedure 
was  turned  "OFF",  with  the  option 
i ;p ACF. "  to  specify  that  the  oxecu- 
t  i me  e;;t  imato  for  that  procedure  is 
oitp.it  each  time  the  procedure  is 
ed .  All  other  probes  are  termi- 
w;‘L  the  "AVKKAGE"  option,  and  so 
•  •cold  .,nly  a  S’mmary  of  results. 


i  r,  t  i-r  nr  et  at  ion  of  the  Output  Data 
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:ged  only 

slightly  due  tu  a  per  fur  mane  ripple 
eflect,  if:  its  maximum  execution  time 
increases  disproportionately,  then  that 
module  is  a  serious  candidate  for 
reexamination. 

"Total  Estimated  Execution  Time" 
gives  the  approximate  total  cost  for  the 
execution  of  the  whole  program,  in  units 
o f  time. 

These  execution  time  figures  may  be 
compared  with  those  obtained  from 
previous  executions  of  the  program  to 
identify  performance  changes.  They  may 
also  be  compared  against  the  performance 
requirements  of  the  program  to  detect  any 
violations,  and  to  provide  an  early 
warning  of  possible  future  violations. 

CONCLUSION 

In  this  paper  we  have  shown  that 
measuring  execution  time  of  programs  to 
obtain  a  performance  index  is  not  an  easy 
task  and  requires  some  assumptions  and 
approximations  to  be  made.  Our 

self--  -trie  approach  was  developed  to 
monitor  the  performance  of  programs.  The 
sell -metric  approach  implies  that  the 
execution  time  estimates  obtained  are  a 
measure  of  only  the  time  for  executing 
the  various  arithmetic,  Logic  and  I/O 
operations  encountered  in  the  p. 09 ram. 

Trie  procedure  for  using  the 
pseudo-statements  requires  the  user  to 
manual ly  segment  the  program  by  inserting 
pseudo-statements  that  are  interpreted  by 
tse  analyzer.  While  this  requirement 
imposes  some  extra  work  on  the  part  of 
t  user,  it  also  provides  the  freedom  to 
m  1  for  any  section  of  the  program  that 
•  ;ser  considers  necessary.  By 

ippfopr.  iateiy  placing  the  ON-OFF  switch 
pair.',,  the  user  is  role  to  better  observe 
the  behavior  of  the  program. 

Two  modifications  can  be  made  to  the 
current  system  to  deal  with  the  following 
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the  analyzer  could  be  nade  available  for 
the  cases  in  which  the  smallest  unit  to 
be  measured  would  be  a  procedure  or 
function.  Procedure  entry  and  exit 
points  are  easy  to  detect  automatically, 
and  such  information  is  easily  related  to 
the  use  of  ON-OFF  switches. 

The  estimator  was,  in  this  work, 
developed  for  the  detection  of 
performance  ripple  effects,  but  its  use 
can  be  extended  to  other  areas  such  as 
detecting  heavily  used  code,  identifying 
code  which  has  not  been  executed  by  any 
test  case,  defining  the  relative 
importance  of  different  modules,  and 
helping  in  selecting  between  different 
algor i thms . 

APPENDIX 

Instrumentation  of  PASCAL  Statements 

This  appendix  describes  standard 
PASCAL  statements  ahd  their  instrumented 
versions  as  constructed  by  means  of  the 
analyzer.  First  we  make  the  following 
def i ni t ions : 


:  Any  PASCAL 
statement  ; 


language 


exp  :  Any  expression  that,  when 

evaluated,  yields  a  value 
of  some  scalar  type  (e.g. 
integer  ,  boolean) ; 

c  :  statement  cost  counter  ; 

cost(x)  :  cost  function  for  "x" ,  in 
units  of  time; 

ovrhd  :  additional  cost,  not 
as=so=ci=ated  with  any 
syntax  unit  (these  are 
constants,  determined  by 
ex=pe=r i=ment) ; 

var  :  Any  control  variable; 

rec  var  :  Any  record  variable; 

lx] *  :  indicates  that  "x"  may 

occur  zero  or  more  times; 

[x  I  y]  :  indicates  that  either  "x" 
or  ”y"  will  appear . 

The  herein  -  end  bracket 

standard  syntax: 

begin  [ S  — i  ;]*  S-n  end  ; 

A  "beg  in-end"  pair  encloses  « 
sequence  of  statements  that  are  all 
executed,  in  order  of  appearance.  The 
cost  of  each  statement  stiouid  be  added  to 
the  counter  before  execution  of  that, 
n  t  a  t.eme  n  t.  . 


v<  ■  r  s  1  on 
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i nstrumerucd  version: 

Leg i n  [c  :=  c  +  cost(S-i)  ;  S-i  :]* 
c  :=  c  +  cost |S-n)  ;  S-n  ; 

end  ; 

cost  of  this  statement:  none. 

The  re pea t  statement 

standard  syntax: 

repeat  [S-i  ;]*  S-n 

ant i 1  exp  ; 

The  statement  sequence  S-l,...,S-n 
is  repeatedly  executed  until  "exp", 
evaluated  after  each  execution  of  the 
statement  sequence,  becomes  true.  The 
cost  of  this  sequence  is  computed  in  a 
manner  analogous  to  that  described  for 
the  "Legin-end"  statement,  but  must  also 
include  the  cost  for  the  evaluation  of 

the  expression. 

instrumented  version: 
beg  i  n 

repeat 

[c  :=  c  +  cost(S-i)  ;  S-i  :)* 

c  :=  c  +  cost (S-n)  ;  S-n  ; 

c  :=  c  +  cost (exp)  +  ovrhd-2 
unt^il  exp  ; 
c  :=  c  +  ovrhd-3 
end  ; 

cost  of  this  statement : 
ovr  hd- 1 . 

The  case  statement 

standard  syntax: 
case  exp  of 

[case  label  list-i  :  S-i  ;]* 
case  label  list-n  :  S-n 
end  ; 

The  case  statement  selects  for 
execution  the  statement  whose  label  is 
equal  to  the  current  value  of  the 
selecting  expression.  The  cost  of 
evaluating  the  expression  is  added  to 
that  of  the  statement  that  brackets  the 
case  statement;  the  cost  of  each 
statement  S-l,...,S-n  is  incorporated  in 
the  statement  itself. 

instrumented  version: 
easy?  exp  of 
Tease  laHel  list-i  : 

beg i n  c  :=  c  +  cost(S-i)  ; 

S-i  ; 

c  :=  c  +  ovrhd-2 
end ; ) * 

case  label  list-n  : 

beg  i n  c  :=  c;  +  cost  (S-n)  ; 

S-n  ; 

c  :=  c  +  ovrhd-2 

end 

end  ; 


cost  of  this  statement: 
cost (exp)  +  ovrhd-1. 

The  if  statement 

standard  syntax: 

i f  exp  then  S-l  [else  S-2  |  ]  ; 

The  expression  is  evaluated  only 
once,  and  the  action  taken  next  depends 
on  its  outcome:  S-l  is  executed  if  "exp" 
is  true  and  S-2  if  "exp"  is  false.  The 
cost  of  evaluating  the  expression  is 
added  to  the  cost  of  the  statement  within 
which  the  "if"  statement  appears;  the 
cost  of  S-l  and  S-2  are  incorporated  into 
the  statements  themselves. 

instrumented  version: 
i  f  exp 

then  begin  c  :=  c  +  ovrhd-1  ; 

c  :=  c  +  cost(S-l)  ;  S-l  ; 
c  :-  c  +  ovrhd-2 

end 

[else  beg i n  c  :=  c  +  ovrhd-3  ; 

c  :=  c  +  cost (S-2)  ;  S-2  ; 

c  :  =  c  +  ovrhd-4 
end  |  ]  ; 

cost  of  this  statement: 
cost (exp) . 

The  for  statement 

standard  syntax: 

for  var  :=  e:cp-l  [to  |  down  to]  exp-2  oo 
S  ; 

The  cost  of  evaluating  the  initial 
expression  "exp-1"  and  the  final 
expression  "exp-2",  each  evaluated  only 
once,  is  added  to  the  cost  of  the 
state=ment  within  which  the  "for" 
state=ment  appears;  the  cost  of  the 
state=ment  "S"  is  in=cor=porated  into  the 
state=ment  itself. 

instrumented  version: 
begin  for  var  :=  exp-1 

[to  |  downto]  exp-2  do 
beg i n  c  :=  c  +  ovrhd-2  ; 
c  : =  c  +  cost (S)  ;  S  ; 

c  :=  c  +  ovrhd-3 
end  ; 

c  :=  c  +  ovrhd-4 

end  ; 

cost  of  this  statement: 

cost(exp-l)  +  cost(exp-2)  +  ovrhd-1. 

The  with  statement 

standard  syntax: 

with  rec  var  [,rec  var]*  do  S  ; 

All  costs  associated  with  evaluating 
the  record  variable(s)  are  added  to  the 
statement  within  which  the  "with" 


statement  occurred;  the  cost  of  the  S 
statement  is  incorporated  into  itself. 


instrumented  version: 

with  rec  var  [,rec  var] *  do 
begin  c  :=  c  +  cost(S) 


S  end 


cost  of  this  statement:  none. 

The  while  statement 

standard  syntax: 
whi le  exp  do  S  ; 

The  controlling  expression  is 
evaluated  before  each  iteration, 
therefore  the  cost  of  its  evaluation  is 
added  to  the  cost  of  the  statement;  the 
total  cost  is  incorporated  into  the 
statement  itself. 

instrumented  version: 
beg  in 

while  exp  do 

beg i n  c  :=  c  +  ovrhd-2  ; 

C  :  =  c  +  cost(S)  ;  S  ; 
c  :=  c  +  cost(exp)  +  ovrhd-3 
end  ; 

c  :=  c  +  ovrhd-4 


cost  of  this  statement: 
cost(exp)  4-  ovrhd-1. 

Procedure  declaration 

standard  syntax: 

procedure  proc-ident 

([formal  par  list  |  ])  ; 
proc-body  ; 


Procedures  may  be 
different  points  in  the 
different  probes  (counte 
active,  therefore,  in  orde 
the  correct  counter  so  that 
executing  the  procedure  i 
cost  of  the  segment  in  wh 
originated,  it  is  necessa 
relevant  probe  as  a  variabl 
the  procedure. 


called  from 
program  where 
rs)  may  be 
r  to  increment 
the  cost  of 
s  added  to  the 
ich  the  call 
ry  to  pass  the 
e  parameter  to 


instrumented  version: 
procedure  proc-ident 

( [ formal  par  list;  |  ] 

var  c  :  integer)  ; 
proc-body  ; 

cost  of  this  declaration:  none. 


Procedure  statement 
Standard  syntax: 

proc-ident  [  (actual  par  list)  [  1  ; 


the  statements  in  the  procedure. 

instrumented  version: 

proc-ident  ([actual  par  list,  |  ) 
active-probe)  ; 

cost  of  this  statement: 
cost(actual  par  list)  + 
cost (procedure  call). 

Function  declaration 

standard  syntax: 

function  func-ident 

[  ( formal  par  list)  |  ] 

:  func-type  ; 

func-body  ; 

The  considerations  for  a  function 
definition  are  analogous  to  those 
presented  for  the  procedure  declaration. 

instrumented  version: 
function  func-ident 

( [formal  par  list;  |  ] 
var  c  :  integer)  :  func-type  ; 
func-body  ; 

cost  of  this  declaration:  none. 

Function  call 
standard  syntax: 

func-ident  [  (actual  par  list)  |  )  ; 

The  considerations  for  a  function 
call  are  analogous  to  those  presented  for 
the  procedure  statement. 

instrumented  version: 

func-ident  (  [actual  par  list  ,  |  1 

active-probe)  ; 

cost  of  this  statement: 
cost(actual  par  list)  + 
cost (function  call). 

Goto  statement 

standard  syntax: 

[S-i  ;)*  goto  label  ; 

The  cost  associated  with  the  goto 
statement  must  represent  the  costs  of  all 
statements  executed  along  the  path  to  the 
goto  statement. 

instrumented  version: 

[c  :=  c  +  cost (S-i)  ;  S-i  ;]* 

c  :=  c  +  cost (goto)  ; 
goto  label  ; 

cost  of  this  statement:  none. 

Labe  1 1 ed  stateroe nt 

standard  syntax: 

(S-i  ; ] *  label  : 


The  act  ive  probe  (countei)  has  to  be 
passed  to  the  procedure  as  a  parameter  to 
account  for  the  cost  of  the  execution  of 


[  S- j  ;1 


The  occurrence  o£  a  label  causes  a 
new  count  to  be  initiated,  anticipating  a 
later  jump. 

instrumented  version: 

[c  :=  c  +  cost(S-i)  ;  S-i  ;]* 

label  :  [c  :=  c  +  cost(S-j)  ;  S-j  ;]* 

cost  of  this  statement:  none. 
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g.  3  -  Simplified  view  of  the  self  metric  approach 


The  algorithm: 

VAR  I,J,K  :  INTEGER  ; 

CLOCKl ,  CL0CK2,  DIFF  :  INTEGER  ; 

MEAN,  DEV  :  REAL  j 

PROCEDURE  5GETJPI  (VAR  TIME:  INTEGER) ;  EXTERN  ; 
BEGIN 

FOR  I  :=  1  TO  NUMBEROFMEASUREMENTSESSIONS  DO 
BEGIN  RESET  VARIABLES  ; 

FOR  J  :=  1  TO  NUMBEROFSAMPLES  DO 
BEGIN  SGETJPI  (CLOCKl)  ; 

FOR  K  :=  1  TO  NUMB EROF ITERATIONS  DO 
EXECUTE  SAMPLE  STATEMENT  ; 

$GETJP I  (CL0CK2)  ; 

DIFF  :=  CLOCK2  -  CLOCKl  ; 

COMPUTE  DYNAMIC  MEAN  AND 

STANDARD  DEVIATION  FOR  'DIFF"  ; 

END  ; 

OUTPUT  RESULTS  ; 

END  ; 

END  : 


Fig. 


4  -  Algorithm  used  to  estimate  the  cost  of  operations 
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PROGRAM  SORT  (DATAF I LE , OUTPUT) ; 

CONST  MAX KEY  =  99999; 

TYPE  KEYTYPE  =  0..MAXKEY; 

SOMETYPE  =  RECORD  TRANSTYPE  :  (DELIVERY, DISPATCH) ; 

AMOUNT  :  1  . .  MAXINT 
END; 

ITEM  =  RECORD  KEY  ;  KEYTYPE; 

RESTOFRECORD  :  SOMETYPE 
END; 

FILETYPE  =  FILE  OF  ITEM; 

VAR  C,  DATAFILE  :  FILETYPE; 

(*  DEFINE  8  PROBES  *) 

$$  VAR  0X21,0X22,0X23,0X24,0X25,0X26,0X37,0X28; 

PROCEDURE  NATURALMERGESORT (VAR  C  :  FILETYPE); 

VAR  NUMBEROFRUNS  :  0  ..  MAXINT; 

A , B  :  FILETYPE; 

EN DOF RUN  :  BOOLEAN; 

PROCEDURE  COPY (VAR  SOURCE,  DESTINATION  :  FILETYPE); 

VAR  COP I ED ITEM  :  ITEM; 

BEGIN  $$  ON  CX2 1 ; 

COPI EDITEM : = SOURCE"  ; 

GET  (SOURCE) ; 

DESTINATION" : =COPIEDITEM; 

PUT (DESTINATION) ; 

IF  EOF (SOURCE)  THEN 
EN DOF RUN  :=  TRUE 

ELSE  ENDOFRUN  :=  COP I ED ITEM. KEY  >  SOURCE". KEY; 

$$  OFF  NONAVERAGE 
END; ( *COPY* ) 

PROCEDURE  COPYARUN (VAR  SOURCE , DESTINATION :  FILETYPE); 

BEGIN  $$  ON  CX2 2 ; 

REPEAT 

COPY (SOURCE, DESTINATION)  ; 

UNTIL  ENDOFRUN; 

$$  OFF  AVERAGE 
END; (‘COPYARUN*) 

PROCEDURE  DISTRIBUTE; 

BEGIN  $$  ON  CX2  3 ; 

REPEAT 

COPYARUN (C, A) ; 

IF  NOT  EOF (C)  THEN  COPYARUN (C , B) ; 

UNTIL  EOF (C) ; 

$$  OFF  AVERAGE 
END; (‘DISTRIBUTE*) 

PROCEDURE  MERGE; 

PROCEDURE  MERGEARUNFROMAANDB ; 

BEGIN  $$  ON  CX2 4 ; 

REPEAT 

IF  A" ,KEY<B" .KEY  THEN 
BEGIN  COPY ( A ,C) ; 

IF  ENDOFRUN  THEN  COPYARUN (B ,C) ; 

END 

ELSE  BEGIN  COPY(B,C); 

IF  ENDOFRUN  THEN  COPYARUN (A ,C) ; 

END; 


Fig.  5  -  A  sample  segmented  program 
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> 


UNTIL  ENDOFRUN; 

$$  OFF  AVERAGE 
END; ( ‘MERGE ARUNFROMAANDB * ) 

BEGIN  $$  ON  CXZ  5 ; 

WHILE  NOT  (EOF (A)  OR  EOF (B) )  DO 
BEGIN  MERGEARUNFROMAANDB; 

NUMBEROFRUNS : =NUMBEROFRUNS  +  1; 

END; 

WHILE  NOT  EOF (A)  DO 
BEGIN  COPYARUN (A, C) ; 

NUMBEROFRUNS : =NUMBEROFRUNS  +  1; 

END; 

WHILE  NOT  EOF (B)  DO 
BEGIN  COPYARUN (B,C) ; 

NUMBEROFRUNS :=NUMBEROFRUNS  +  1; 

END; 

$$  OFF  AVERAGE 
END;  (‘MERGE*) 

BEGIN  $$  ON  CXZ 6 ; 

REPEAT  RESET (C) ; 

REWRITE (A) ;  REWRITE (B) ; 

DISTRIBUTE; 

RESET (A);  RESET (B); 

REWRITE (C) ; 

NUMBEROFRUNS : =0  ; 

MERGE; 

UNTIL  NUMBEROFRUNS=l; 

$$  OFF  AVERAGE 
END; ( *NATURALMERGESORT* ) 

PROCEDURE  COPYFILE (VAR  F ,G : FI LETYPE) ; 

BEGIN  $$  ON  CXZ 7 
RESET (F) ; 

REWRITE (G)  ; 

WHILE  NOT  EOF (F)  DO 
BEGIN  WRITELN (F~ .KEY) ; 

G' :=F~; 

PUT (G)  ;  GET (F)  ; 

END; 

$$  OFF  AVERAGE 
END; (‘COPYFILE*) 

BEGIN 

(*  INITIALIZE  THE  COUNTERS  *) 

S$  IN  IT ; 

$$  ON  CXZ8 ; 

WRITELN  ('  * ‘UNSORTED  RECORD  KEYS 
COPYFILE (DATAFILE, C) ; 

NATURALMERGESORT (C) ; 

WRITELN;  WRITELN; 

WRITELN  ('“  SORTED  RP^ORD  KEYS 
COPYFILE (C, DATAFILE)  ; 

WR ITELN ( '  END  OF  PROGRAM'); 

$$  OFF  AVERAGE; 

(*  PRINT  FINAL  EXECUTION  RESULTS  *) 

$$  RESULT 
END. 

Fig.  5  -  A  sample  segmented  program  (continued) 
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***  INTERMEDIATE  RESULTS  *** 


CXZ2 

■ 

4490 

CXZ2 

■ 

8970 

CXZ2 

* 

13438 

CXZ2 

= 

4478 

CXZ2 

s 

13438 

CXZ2 

s 

13450 

CXZ  2 

= 

13438 

CXZ2 

3 

4478 

***  FINAL  RESULTS  *** 


PROBE 

FREQUENCY 

COUNT  MEAN 

STD.  DEVIATION 

MAXIMUM 

1 

24 

4274.400 

4.752 

4277 

2 

8 

9522.501 

3221.567 

13450 

3 

2 

27806.500 

119.250 

28045 

4 

2 

20442.000 

3373.500 

27189 

5 

2 

28008 . 500 

39.750 

28088 

6 

1 

131220.000 

0.000 

131220 

7 

2 

31301.000 

0.000 

31301 

8 

1 

197138.000 

0.000 

197138 

TOTAL  ESTIMATED  EXECUTION  TIME  :  197138 


Fig.  6  -  Sample  Execution  Time  Output  for  Program  SORT 
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DESIGN  iiTAuILI'i'V  MEASURE.;  FOR  JvFi'.vAkE  MAINTENANCE* 


Stephen  S.  Yau 

Deportment  of  Electrical  Engineer  ing 
and  Computer  Science 
Northwestern  University 
Evanston,  Illinois  60201 


SUMMARY 

The  hi  jh  cost  of  software  during  its  life 
cycle  can  be  edributed  largely  to  software 
maintenance  activities,  and  a  major  portion 
of  these  activities  is  to  deal  with  the  modifi¬ 
cations  of  the  software.  In  this  paper,  design 
stability  measures  which  indicate  the  potential 
ripple  effect  characteristics  due  to  modifi¬ 
cations  of  tfie  program  at  the  design  Itvei  ire 
presented.  These  measures  can  t»  generated  at 
any  point  in  the  design  phase  of  the  software 
life  cycle  which  enables  early  maintainability 
feedback  to  the  software  developers.  The  vali¬ 
dation  of  these  measures  and  future  tese.net. 
efforts  involving  the  development  of  a  user- 
oriented  maint  linability  measure  which  incorp¬ 
orates  the  design  stability  measures  as  well 
as  other  design  measures  are  discussed. 

Index  Terms  -  Design  stability  measures, 
program  modifications,  and  software  maint 

_I_NTRuDUCTI  ON 

The  major  expenses  in  computer  systems 
at  present  are  in  software.  While  the  cost 
of  hardware  is  decreasing  rapidly,  software 
productivity  improves  only  slowly.  Tints,  the 
cost  of  software  relative  to  hardware  is  rapidly 
increasing.  The  majority  of  this  software  cost 
can  be  attributed  to  software  maintenance. 
The  cost  of  maintenance  activities  has  been 
very  high  ranging  from  40  percent  to  as  ugh 
as  80  percent  of  the  total  cost  during  the  life 
cycle  of  large-scale  software  systems  [Boeh7J, 
Zel k78 ,  Lien78 ) . 

The  control  of  software  maintenance  costs 
can  be  approached  in  several  ways.  One  approach 
is  to  improve  tiie  product  i  vi  ty  of  maintenance 
practitioners  by  providing  them  with  tools  and 
techniques  to  help  them  perform  their  mainten¬ 
ance  tasks.  Advances  in  this  area  tiave  included 
debugging  tools,  program  flow-charters,  and 
ripple  effect  analysis  tools.  Alttiough  these 
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design  statii 


e  a  ah  T;  a  ■!  j  is  x  , 
v  r>5  as  follows: 


Step  o:  T.onpjte  trie  program  1es.gr.  rtv..i:  .y 
PL'3  as  follows: 

PDS  =  1/(1  +  '  ..  XREvi  , 
where  x  is  a  module  in  the  program. 

AN  EXAMPLE 

in  this  sei' t  ion ,  computation  f  th>>  design 
s'  itility  iteif  nv:;  will  be  i  1 us' rat.e  1  f'r  *w 
1‘- signs  trie  same  problem.  This  ex  imp.  I  is 

•  i  yen  fr  a,  ,  Your  Vi)  .  The  problem  '•  t  s  :  / 1  ■•-  : 

realmg  t-xt  from  an  online  key  hair:!  and  text 
in  :i  card  file,  lisneoling  the  text  :r.to  words', 

•  ir.  1  •  orr.b  ;  ri .  ng  these  words  according  t_>  cede  from 

i  he  keyboard  an  d  codes  contained  in  trie  cards. 
Input  begins  with  the  keyboard  and  continues, 
ohara oter-by -character,  until  the  ideograph  "SRC" 
is  received.  At  tint  point,  the  reading  of  input 
from  cards  is  to  commence  and  continue  until 
the  ideograph  ”11"  is  reached.  Input  from  the 
keyboard  then  resumes.  An  end -of-transmiss:  on 
from  the  keyboard  triggers  reading  the  remaining 
cards.  The  continuous  stream  of  text  from  these 
two  source.,  is  to  tie  broken  into  separate  English 
words,  which  are  then  passed  individually  to 
a  pre-existing  module  named  rr  V">,TSD.  The  higu 
level  structure  charts  f  or  each  module  w  ;  •.  h 
c-orresp  -n  ’’.•.’  inputs  and  outputs  i r ■  ■  shown  in 
Figure  ?  •  alternative  l  and  Figure  1  fci 

a  1  terr.at  i"-.-  . 

Th-  i  Sign  stability  a  ig.-r  i  t.hm  will  now 

be  app;  ;,;d  f  i-  both  al  *or-:.at"s. 
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UNCLASSIFIED 


F/G  9/2 


NL 


ALTERNATIVE  1 


ALTERNATIVE  2 


Step  1 
J 


"SCANWORD  L' 

j*  _ _  =  si.  -:y,  readcard,  findword, 

M.AIWS.P  PR1  'Dl 


J1NKEY  ~  JRF.ADCARP  =  JFINDWORD  =  JPROCWORI) 
=  {SCANWORD}, 

R INKEY. SCANWORD  =  {character  ,end-o£- 
i  ransir. i  ss  ion  flag), 

%FADOARD,SCANWC.  =  i^d^image,  last-card 

8,.,,.,,.,,,  =  [word,  end-o£-words 

r  INP„oRo,ScANWOi\.  flag,  get-character 


flag,  get-character 
flag,  get-card  flaq, 
word-done  flag}. 


rprocword  ,  scasv*  >rd  81 

S'  .  _ _  _  =  H 


,NWORl>.  ;■  X FV 


s  y 1 ,  =  0 

SCANWOKI), READCARD 


R'-,  .\N,'k.  .-ROCWORD  =  iW0rdl< 

R ’ . .  .  =  [character,  end-of- 

"  :  '•MVU'1®  transmission  flag, 

card  image,  last-card 
flag,  source}. 

Step  2  There  are  no  global  data  items. 

Step  3 

tpixkfy  .SCANWORD  =1+1=2, 

TPor-Ani-Ann.  ci-avvorm  =2  +  1  =  3, 


SCANUXI 

l ' 

C  f'  \  VT  f  VI 


=  (GETWORD,  PROCWORD), 

=  JPRocwiiRi,  =  (scantext}, 

=  {GETCHAR,  GETCARD), 

=  J,:,  Tr«n  =  {GETWORD} , 


CETWORD, SCANTEXT 


PROCWORD, SCANTEXT 
ROETCHAR, CETWORD 


[word,  end-of-wordc 
flag). 


(character,  end-of- 
transmission  flag}, 

[word,  end-of-words 

CRT  CARD, CETWORD  f  lag} , 

R' SCANTEXT, CETWORD  =  ®* 

R| scantext. procword  “  {wordi( 

R' CETWORD, OEI'CHAK  =  R'  c.f.tword.cetcard 
Step  2  There  are  no  global  data  items. 

Step  3 

TP  =2+1=3, 

n.-T  ecu-  i:vt  ' 


1  rREADt'ARD , SCANWORD  "  e-  ~ 

TPE  t NDKt'RP ,  SCANWORD  *  1  +  1  +  1  +1 

tppri>cwori),s(:anword  =  °* 

Step  4 

TP* SCANWORl), iNKKY  *  TP’ SCANWORD, READCARD  *  ° ' 

7P  SCANWORD, PROCWORD  *  2' 

TP1  =1+1+2  +1+1 

S(  ASWORD, FINDWORD  =  g. 

Step  S  TC  =  0  for  all  modules  x. 


CETWORD, SCANTEXT 
TP  =  0. 

PROCWORD. SCANTEXT 


OETCHAR, CETWORD 
1  CETC.ARD, CETWORD 


=1+1=2, 

=  2+1  =  3. 


TP 1  ~  0 

SCANTEXT, CETWORI)  ’ 

TP 1  =  1 

SCANTEXT, PROCWORD  ' 


TP’  CETWORD, CCTCHAR 


fP' CETWORD, CET CARD 


TG  =  0  for  all  modules  x. 


DLRESCANWORl) 

=  8 

dlrEiNKEV 

=  2 

DLRE READCARD 

=  3 

DLRE FINDWORD 

=  6 

DLRE PROCWORD 
7 

=  0 

05  SCANWORD  = 

1/9 

“iNKEY  = 

1/3 

05 READCARD  ” 

1/4 

05 FINDWORD  ' 

1/7 

DS 

PROCWORD 

1. 

Step  8  PDS  =  1/20. 

Alternative  I 


DLRE  .  =2, 

SCANTEX I  ' 

DLRE  =  3, 

CFTWORI) 

DLRE  =  0, 

PROCWORD  ' 

DLRE  =  2, 

OETCHAR 

DLRE  =  3. 

OK  I  CARD 

7 


1X5  SCANTEXT  1/'i' 

°S  CETWORD  ~ 

DS  =1, 

PROCWORD  ' 

DS  =  1/3, 

OETCHAR 

DS  =1/4. 

GETCARD 

8 


Alternative  2 


=1/11 


Analysis  of  the  metrics  obtained  for  both 
alternatives  indicates  that  alternative  2  is 
.sote  stable  than  alternative  1.  This  finding 
is  supported  by  the  discussion  in  the  source 
at  the  example  that  alternative  2  is  easier  to 
program  and  maintain.  Further  analysis  of  these 
metrics  indicates  that  the  primary  sources  of 
instability  in  alternative  1  are  modules 
FINDlvORD,  and  SCANWORD.  This  finding  is  again 
supported  by  the  discussion  in  the  source  of 
the  example  [Your  ?yj. 

VALIDATION 

An  important  requirement  of  any  metric  is 
the  capability  of  validating  it.  In  this  section 
botn  direct  and  indirect  approaches  to  validating 
the  design  stability  measures  will  be  discussed. 
A  direct  approacn  to  validation  consisting  of 
expo r imontat ion  with  the  metrics  was  performed 
by  the  authors  utilizing  a  graduate  software 
engineering  class.  The  class  consisted  of  24 
professional  programmers  witti  diverse  company 
experiences.  The  course  assignment  was  to  design 
and  implement  an  automated  gradebook  system  in 
PASCAL.  The  class  was  divided  into  4  teams  each 
of  which  was  to  build  a  program  of  an  estimated 
4K  lines.  The  class  utilized  the  structured 
design  methodology  to  produce  a  complete  program 
design  specification.  This  design  specification 
was  then  utilized  to  compute  the  design  stability 
measures.  The  module  design  stability  measures 
obtained  had  a  broad  range  from  1/145  to  1.  It 
was  interesting  to  note  that  the  degree  of  module 
fan- in/fan-out  did  not  always  correlate  with 
the  design  stability.  For  example,  many  modules 
with  small  fan-in/fan-out  had  poor  stability 
and  vice  versa. 

Jpon  completion  of  the  program  design 
specification,  the  class  was  then  asked  to  submit 
proposals  for  possible  changes  to  the  program. 
Over  200  such  change  proposals  were  received. 
These  proposals  were  analyzed  in  terms  of  their 
potential  ripple  effect  if  they  were  to  be 
implemented.  Several  interesting  results  of 
this  experiment  will  now  be  described. 

The  first  result  is  that  those  modules  which 
would  nave  contributed  large  ripple  effects  if 
modified  are  among  the  modules  possessing  poor 
design  stability  measures.  The  converse, 
however,  is  not  necessar.ly  true.  Since  the 
design  stability  measures  reflect  a  potential 
worst  case  ripple  effect,  it  is  possible  for 
modules  with  poor  stability  to  be  modified  in 
certain  ways  without  producing  a  large  ripple 
effect . 

Another  result  of  the  experiment  illustrated 
trie  diagnostic  capabilities  of  the  design 
stability  measures.  Many  of  the  modules  found 
to  possess  poor  stability  also  were  of  weak 
functional  strength  and  were  common  coupled  to 
many  other  modules.  It  should  be  noted,  however, 
that  some  modules  which  possess  poor  stability 
are  not  necessarily  bad.  For  example,  imple¬ 
mentations  of  data  abstractions  usually  possess 


poor  stability.  The  important  point  is  ts.it 
if  the  assumptions  made  upon  a  module  with  poo: 
stability  are  violated,  the  potential  tipple 
effect  is  large.  Thus,  these  assumptions  must 
be  examined  carefully  with  an  eye  toward;,  future 
modifications. 

Although  the  experimentation  with  the  design 
stability  measures  produced  several  interesting 
results,  it  cannot  be  utilized  as  a  complete 
validation  of  the  measures.  Experiment:;  witti 
maintenance-type  measures  can  lx-  very  mislead¬ 
ing  due  to  the  diverse  and  numerous  types  of 
maintenance  tasks  which  may  t,e  performed.  For 
example,  maintenance  data  collected  reg  it  ding 
the  maintenance  activity  that  a  particular  pro¬ 
gram  experienced  may  not  be  representative  of 
the  maintenance  activity  in  other  pro  jr ins. 
A  complete  direct  validation  of  the  desi  jii  sta¬ 
bility  measures  will,  thus,  require  a  lar  je 
database  of  maintenance  information  tor  a 
significant  number  of  various  types  of  programs 
which  have  undergone  a  sufficient  number  of 
mod i f  icat  ions  of  a  wide  variety.  The  short-term 
possibility  of  utilizing  such  a  maintenance  iiti- 
base  for  validating  maintenance- type  measures 
is  not  very  promising.  In  1 ight  of  this  reality 
and  diverse  nature  of  the  maintenance  tasks 
performed  by  users  of  software  systems,  a  more 
user -oriented  approach  to  maintenance  met  r i  • 
computation  is  needed.  These  user-oriented 
maintainability  metrics  will  combine  the  unique 
potential  future  maintenance  requirements  of 
a  user  with  the  characteristics  of  the  software 
associated  with  these  potential  modi  f  icat  i 
to  produce  a  tailored  measure  if  the  exjw-oted 
ma  inta  inat'i  i  i  ty  to  be  experienced  by  the  user. 
These  ideas  will  be  descr  ibed  in  note  bet  n  i 
later . 

Since  further  experimentation  utilizing 
the  design  stability  measures  roue!  be  misleading 
without  a  large  maintenance  iat.ib.nw,  a  compute 
direct  validation  will  be  delayed  until  the 
development  of  a  user -or  lent ed  maintainability 
measure.  The  design  stability  measures  can  lx-, 
however,  indirectly  validated  by  arguing  how 
the  measures  are  affected  by  various  already 
established  attributes  of  programs  which  affect 
maintainability.  It  should  be  noted  that  most 
of  these  established  attributes  sutler  from  the 
same  validation  problems  as  the  design  stability- 
measures,  and  their  acceptance  is  largely  a 
consequence  of  intuitive  arguments. 

Because  one  program  attribute  which  affects 
ma  l  nta  i  nab  i  1  i  ty  is  ttie  utilization  of  data 
abstraction  and  information  hiding  (ParnT.’], 
an  indirect  validation  of  the  design  stability 
measures  must  show  that  the  design  stability 
of  programs  utilizing  data  abstraction  and 
information  hiding  is  generally  better  than 
that  of  programs  which  do  not.  Since  our  mea¬ 
sures  are  based  upon  counts  of  assumptions  made 
concerning  interface  variables  and  since  a  lack 
of  data  abstraction  and  information  hiding 
manifests  itself  in  an  increase  in  assumption 
counts,  it  is  apparent  that  the  design  stability 


of  programs  utilizing  data  abstraction  and 
information  hiding  is  generally  better  than  that 
of  programs  which  do  not. 

The  relationship  of  the  design  stability 
measures  with  both  the  data  abstraction  and 
global  variable  notions  can  be  further  illus¬ 
trated  by  the  following  example: 

Consider  the  case  of  3  modules  A,  8,  and 
C  which  share  a  global  array  of  records,  where 
each  record  consists  of  an  integer  ID  number 
and  a  real  balance  as  indicated  in  Figure  4. 
If  we  also  assume  that  no  parameters  are  passed 
between  the  MAIN  module  and  modules  A,  B,  and 
C  and  that  modules  A,  B,  and  C  make  assumptions 
about  the  values  of  the  ID  number  and  the  bal¬ 
ance,  the  following  values  can  be  obtained: 


^tAIN  =  1* 

TC  =  TC  =  TC  =6*6=  12, 

An! 

DLREa  =  DLRER  =  DLREC  =  12, 
DSA  =  D6B  =  DSC  =  1/13, 

PDS  =  1/37. 


In  Figure  5,  the  program  is  redesigned  to 
utilize  a  data  abstraction  module  X  to  eliminate 
the  need  for  having  a  global  array  of  records. 
The  data  abstraction  passes  a  single  record  to 
the  modules  A,  B,  and  C  depending  upon  some  index 
variable.  From  the  design,  the  following  values 
may  be  obtained: 


TC. 


TCb  =  -TCC  = 


TP'  =  TP'  =  TP'  =3+2=5 
AX  BX  CX 

(assuming  that  X  makes  no  assumptions  about  the 
values  in  the  record) 

tpxa  =  tpxb  -  TPxc  =  5' 


DLRE  =  DLRE(J  =  DLRE  =  5, 
ABC 


DLREX  =  15, 

“A  =  ^B  “  =  1/6' 

D6X  =  1/16, 


PD6  =  1/31. 


These  two  examples  illustrate  the  detri¬ 
mental  effect  of  global  data  on  stability  as 
well  as  the  positive  effect  of  data  abstraction 
■nodules.  The  data  abstraction  modules,  although 
quite  unstable  themselves,  improve  the  stability 
of  the  modules  which  utilize  them. 


program  design  generally  recognized  as  contrib¬ 
uting  to  the  development  of  program  stability 
during  maintenance. 

APPLICATIONS  OF  THE  DESIGN  STABILITY  MEASURES 

The  design  stability  measures  presented 
in  this  paper  can  be  utilized  for  comparing 
alternative  designs  of  a  module  or  program  at 
any  point  in  the  design  phase  of  the  software- 
life  cycle.  The  selection  of  alternatives  which 
exhibit  favorable  design  stability  measures  car. 
lead  to  more  maintainable  programs. 

The  design  stability  measures  can  also  be 
utilized  to  identify  portions  of  the  program 
which  exhibit  poor  stability  and,  thus,  may 
contribute  to  ripple  effect  problems  during  the 
maintenance  phase.  These  portions  of  the  program 
can  be  easily  identified  by  the  measures  and 
examined  for  deficiencies.  Those  areas  of  the 
program  with  poor  stability  can  then  be  rede¬ 
signed  incorporating  such  favorable  design 
approaches  as  abstraction,  information  hiding, 
restriction  of  global  variables  and  functionality 
in  order  to  improve  the  design  stability 
measures. 

The  design  stability  measures  will  also 
be  a  key  component  of  any  overall  maintainability 
measure.  As  previously  discussed,  stability 
is  an  important  attribute  of  program  maintain¬ 
ability  which  must  be  combined  with  other  attri¬ 
butes  in  order  to  formulate  a  maintainability 
measure.  Thus,  our  future  research  effoits  in 
the  development  of  a  user-oriented  maintain¬ 
ability  measure  will  incorporate  these  design 
stability  measures. 

CONCLUSIONS  AND  FUTURE  RESEARCH 

In  this  paper,  measures  for  estimating 
design  stability  of  a  program  and  of  the  modules 
within  a  program  have  been  presented.  Algoritiims 
for  computing  these  design  stability  measures, 
applications  of  these  measures,  an  illustrative 
example,  some  experimental  results,  and  an 
indirect  validation  of  the  measures  have  also 
been  presented. 

Much  research  remains  to  be  done  in  this 
area.  Our  primary  emphasis  will  be  on  the 
development  of  a  user-oriented  maintainabi  1  ity 
measure  computable  during  the  design  phase  of 
the  software  life  cycle.  This  metric  will  in¬ 
corporate  our  design  stability  measure  as  well 
as  design  complexity  and  testability  measures. 
Much  experimentation  will  be  needed  in  combining 
these  quality  attributes  into  a  single  measure. 
Extensive  validation  on  large-scale  programs 
will  also  be  performed. 
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The  design  stability  measures  presented 
here  can,  thus,  be  indirectly  validated  since 
they  incorporate  and  reflect  some  aspects  of 
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ABSTRACT 

Tha  rapidly  increasing  coae  of 
software  aaineananca  lndlcaeaa  tne 
Importance  of  developing  affective 
aetnodologiea  for  software  aaineananca. 
In  enia  paper  a  oetnodology  for  aoftwara 
aaintananca,  wnich  decoapoeea  tha 
aoftwara  aaintananca  procaaa  into  four 
pnaaas  ia  presented.  Tha  fir at  pnaaa  is 
to  analyse  the  softwara  in  order  to 
understand  it.  Tha  second  phase  is  to 
generate  and  realize  a  particular 
aodifieation  proposal.  The  third  phase 
is  to  account  for  all  of  the  ripple 
effects  of  the  modification,  including 
both  logical  and  performance  ripple 
effects.  The  fourth  phase  is  to  test  the 
»ad i fled  program  to  insure  that  it 
functions  properly.  To  support  a  wide 
apectrus  of  activities  Involved  in  these 
four  phases a  variety  of  software  tools 
have  been  developed.  By  Baking  use  of 
these  tools,  an  environment  has  thus  been 
created  to  assist  the  software 
maintenance  practitioners  in  performing 
their  functions  more  effectively. 
Currently,  these  tools  have  not  been 
totally  Integrated.  A  method  for 
integrating  these  tools  and  the  databases 
they  need  into  a  unique  aaintananca 
environment  is  presented.  Most  of  the 
tools  discussed  in  this  paper  have  been 
demonstrated  for  PASCAL  on  a  DEC 
vax-11/780  computer. 

Index  Terms-  software  maintenance, 

program  representation,  prograa 

modification,  program  editor,  prograa 
slicing,  ripple  effect  analysis,  test 
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case  generation 


INTRO OOCT ION 

Most  of  tne  expenses  associated  with 
computer  systems  are  due  to  tne  cost  of 
developing  and  maintaining  software.  The 
total  O.S.  expenditure  on  programming  In 
1977  was  estimated  at  between  $50  and 
$100  billion,  wnich  represents  more  than 
3«  of  the  O.S.  GKP  for  that  year  (11. 

It  has  been  estimated  that  by  1985,  the 
cost  of  computer  software  will  soar  to 
90%  of  the  total  system  expenditure  [2]. 
This  is  due  to  the  dramatically 
decreasing  cost  of  hardware  and  the 
increasing  complexity  and  coat  of 
software,  which  has  required  ever  greater 
human  resources  to  develop,  validate  and 
main tain. 

Zt  is  well  recognized  that  the 
maintenance  cost  of  software  has 
increased  continuously  and  that  it  has 
become  the  single  dominant  cost  item 
during  the  life  cycle  of  a  large-scale 
software  system.  Estimates  of 

maintenance  cost  have  been  found  ranging 
from  40%  [2],  67%  [3],  to  as  high  ss  80% 
[4]  of  the  total  cost  during  the  life 
cycle  of  large-scale  software  systems. 
Therefore,  in  order  to  reduce  the  high 
cost  of  software,  it  is  essential  to 
develop  effective  software  maintenance 
methodologies. 

•Software  maintenance*  has  been 
defined  as  'the  process  of  modifying 
existing  operational  software  while 
leaving  its  primary  functions  intact* 
[51.  The  broad  spectrum  of  activities 
which  comprise  software  maintenance 
Includes  error  corrections,  enhancements 
of  capafeTTTtTes,  deletion  of obsolete 
capabilities^  optimization,  and  alnof. 
changes  in  mission  requirements  [•].  An 
excellent  review  of  tne  state-of-the-art 
software  maintenance  techniques  and  tools 
can  be  found  in  [7].  As  Indicated  in 
tnat  report,  much  more  attention  has  been 
focused  on  the  management  aspects  of 
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ia  me  j  in  inq  software  systems  man  on  tne 
tecnnical.  aspects.  Maintenance 

pt  ogr  some  r  s  must  still  nandle  tne 
tecnmcal  proolems  m  an  ad  noc  manner. 

Tnerefoce  tnere  is  an  urgent  need 
for  an  effective  software  maintenance 
metnodology ,  wmcn  snauld  not  only 
address  all  tne  major  problems  of 
software  maintenance,  but  also  provide  a 
wel 1- integrated  maintenance  environment 
to  effectively  solve  tne  software 
maintenance  problems.  In  tnis  paper,  we 
will  discuss  a  metnodology  for  software 
.maintenance  wnicn  incorporates  a  variety 
of  software  tools  to  suppoct  a  unified 
maintenance  environment.  Most  of  tne 
tools  mentioned  m  tnis  paper  nave  been 
demonstrated  for  PASCAL  on  a  DEC 
VAX-11/76Q  compute;.  Eacn  of  tnese 
tools,  however,  operates  on  its  own 
representat ion  of  programs.  A  metnod  for 
integrating  tnese  Cools  into  a  software 
maintenance  environment  will  also  be 
discussed. 


OVERVIEW  or  THE  METHODOLOGY 

Yau  ee  al.  (SI  nave  presented  an 
integrated  view  of  tne  software 
maintenance  process.  Once  a  particular 
maintenance  objective  naa  been 
established,  tne  objective  can  be 

accomplished  in  tne  four  pnasea  as  shown 
in  figure  1. 

The  first  pnaae  is  to  analyse  the 
program  in  order  to  understand  it.  To 
facilitate  tnis,  tne  requirements#  the 
different  levels  of  tne  design  end  tne 
pcogcaa  itself  snould  be  clearly 
described.  This  description  of  tne 

software  system  can  be  beat  prepared 
during  software  development  when  each 
level  of  tne  software  system  and  its 
eonnaction  with  other  levels  are 

understood.  Since  there  are  many 

software  systems  currently  in  operation 
wnicn  nave  been  developed  without  aucn 
descriptive  support,  it  is  necessary  for 
ua  to  establish  a  procedure  by  which  the 
information  can  be  constructed  by 
analysis  of  the  existing  programs,  using 
only  tne  source  code,  in  addition  to 
wnatever  documentation  la  available. 

The  second  pnaae  is  to  generat#  a 
particular  maintenance  proposal  so  that 
tne  maintenance  objective  can  be 
achieved.  The  multi-level  system 
description  mentioned  in  phase  1  can  be 
used  to  determine  tne  effects  of  the 
maintenance  objective  on  eeeh  of  the 
levels.  A  given  program  modification  for 
an  existing  program  must  be  aoeclf ted  at 
different  levels  (i.e.  requirement, 
specification,  dasign  and  code  levels). 


The  code- level  spec i f  icat  ion  of  tne 
modification  can  tnen  be  real  ired  [9]  in 
tne  program  code  to  produce  a  modified 
program  wnicn  is  subject  to 

re-val  idat  ion. 

Tne  third  pnaae  is  to  account  for 
all  tne  ripple  effect  of  tne 
modifications  proposed  in  pnaae  2.  As  a 
result  of  these  modifications,  there  may 
be  logical  inconsistencies  and/or 
significant  degradation  in  program 
performance.  The  r ipple  effect  analysis 
tecnnique  (10-16),  will  identify  both 
logical  and  performance  ripple  effects  of 
tne  proposed  program  modification. 

The  fourtn  pnase  is  to  test  the 
modified  program  to  Insure  that  it 
functions  correctly.  Our ing  tne  software 
maintenance  pnase,  it  is  important  that 
cost-effective  testing  tecnniques  are 
applied  (17).  Testing  of  tne  modified 
software  must  be  done  in  order  to  detect 
unexpected  errors,  sucn  as  dormant  errors 
whlcn,  although  present  in  the  software 
system  before  tne  modification,  may 
become  active  errors  as  a  result  of  the 
mod if lest ion. 

If  the  modified  program  fails  to 
pass  tne  testing  phase,  any  or  all  of  tne 
previous  pnsses  must  be  repeated, 
depending  on  the  extent  and  type  of 
failure.  Xn  the  most  extreme  case,  the 
maintenance  objective  may  itself  be 
considered  infeasible  (because  of  Its 
maintenance  cost,  for  example),  and 
should  be  altered. 


SOTTWA RE  MAIWTEMAMCE  PROCESS 

Xn  this  section,  we  are  going  to 
d incuse  each  of  the  four  phases  of  the 
software  maintenance  process  in  mors 
detail. 

Understanding  the  Software 

During  the  maintenance  pnase  of  the 
A-7  aircraft  flight  program  [IB], 
Bsnlngsr  found  that  tha  nxisting 
documentation  was  sparse  end  not 
up-to-date.  Therefore,  she  decided  that 
it  would  ba  more  cost-effective  to 
re-construet  the  software  requirements 
before  attempting  to  modify  the  software. 
This  software  maintenance  example 
indicates  the  importance  which 
maintenance  personnel  place  on  e  good 
description  of  the  software  system.  It 
also  snows  that,  even  aftar  the  software 
system  haa  entered  tne  operational  pnase, 
it  is  still  feasible  to  construct  such  e 
descr ipt ion. 


r  M 


--.i 


.•  ..  V  V  > 


In  general,  tn*  maintenance  I  low  of  tne  pcsgram  to  extract  any  cod* 

personnel  snouid  not  b*  required  to  “men  may  contribute  to  tn*  values  of 

understand  tn*  entire  system  at  a  t nose  variables  at  tnat  point.  These 

detailed  level,  because  of  tn*  cost  and  program  slices  are  tnemselves 

time  required  to  do  so.  W#  prefer  an  syntact icai ly  correct  programs  and.  if 

approaen  vmcn  allows  cnanges  to  a  executed,  will  produce  values  equal  to 

software  system  to  be  made  correctly,  tnose  produced  by  tne  original  program  at 

witn  tne  effort  to  understand  tne  system  tne  selected  point  (assuming  tnat  tne 

being  concentrated  on  only  tne  portions  original  program  contains  no 

of  tn#  software  system  relevant  to  tne  non-terminat ing  loops  (20]). 
modification.  To  meet  tnis  goal,  a 

detailed  description  of  tne  software  Cencrat  lnq  a  nd  Real  iz Ing  Modi  float  Ion 

system  snouid  be  available,  wnicn  would  ? coposafs  ” 

also  record  tne  relat ionsnips  of  various 

components  at  different  levels  Whep  a  number  of  'cnange  requests* 

(requirements,  arenitectursl  design,  from  tne  users  are  collected  foe 

detailed  design  and  program  code).  When  attention  by  tne  maintenance  staff,  a 

suen  a  layered  description  of  tne  'modification*  is  started.  There  are  a 

software  system  is  available,  tracing  number  of  ways  to  implement  a  particular 

cnanges  to  particular  portions  of  tne  modification,  and  eacn  of  these  is  known 
software  system  may  be  done  more  easily  aa  a  ‘modification  proposal*  until  we 

and  more  accurately.  have  selected  one  particular  modification 

proposal  to  acnieve  tne  maintenance 
*  further  benefit  of  suen  a  objective.  The  elements  wnicn  make  up  a 

description  is  tne  ability  to  directly  modification  are  ’program  changes*.  To 

relate  tn#  program  code  of  tne  software  generate  a  modification  proposal,  it  la 

system  to  tne  modification  request,  wnicn  necessary  to  carry  out  activities  similar 
is  oftan  expressed  in  terms  which  are  to  tnose  of  requirements,  design  and 

more  familiar  to  the  users  of  tne  system  coding,  as  performed  during  tne 

tnan  to  tne  maintenance  programmers.  development  pnase.  The  cnange  requests 

Tnis  relat ionsnip  la  usually  unclear  if  ara  assumed  to  be  In  an  Informal 

only  tne  program  Itself  Is  available.  notation,  but  tne  maintenance  staff  must 

However,  some  useful  descriptive  (ultimately)  alter  a  software  system 

information  can  still  be  extracted  from  wnicn  is  precisely  expressed  or  written 

tne  source  code  of  tne  programs  alonm,  in  a  formal  language.  Therefore,  there 

using  automated  analysis  tools.  ia  a  naed  to  convert  tne  modification 

from  an  informal  notation  to  a  formal 
Prcgram  analysis  tools  nave  been  one. 
available  for  many  years  to  provide  aids 

suen  as  automatic  flowcharting  and  Me  have  cnosen  to  attack  this 

construction  of  call  graphs.  Since  graph  problem  from  both  directional  from  the 

representations  of  data  or  execution  flow  direction  of  the  informally  dafined 

arm  oftan  used  to  describe  software  cnanges  and  from  tne  direction  of  the 

system  requirements  (ss  in  RSI,  (191,  Cor  formally  defined  software.  First,  we 

example) ,  we  must  analyse  the  program  so  need  a  metnod  wnereby  we  can  relata  each 

that  we  can  present  our  Information  in  item  informally  mentioned  in  each  change 

suen  a  format.  Onder  these  request  to  some  known  entity  in  the 
circumstances,  tne  most  attractive  software  system.  In  addition,  we  must 

approach  to  the  construction  of  this  determine  how  the  new  behavior  required 

software  sytem  description  is  one  based  of  tnose  items  may  be  formally  dascrlbed 

on  data  flow  analysis  of  tn#  program,  to  ganerate  a  formal  modification 

witn  tne  intention  of  relating  inputs  and  proposal  to  the  software  system.  Me  call 

outputs  of  tne  program  to  each  other.  this  process  the  spec  if  lest  ion  of  a 

Tne  "program  slicing"  technique  (20 ]  esn  modification  proposal" Figure I  shows 

be  used  for  tnis  purpose.  tne  relationship  between  tne  level  of  tne 

modification  proposal  and  tne  level  of 
Program  slicing  refers  to  a  process  tne  software  description, 

of  selecting  a  portion  of  the  text  of  a 

program  to  form  a  "slice",  where  the  Tne  following  steps  are  repeated  for 

selection  lm  done  automatically,  based  on  eacn  level  of  tne  modification  proposals 

data  flow  analysis  techniques.  The  user 

of  s  program  slleer  must  specify  wnicn  1.  Identify  the  description  level  to 
variables  ate  of  importance,  and  at  wnicn  wnicn  tne  modification  applias. 

point  in  tne  program  tneir  values  are  of 

interest.  These  two  pieces  of  2.  Define  the  interface  between  eacn 

information  constitute  tne  "slicing  cnange  end  tne  software  system, 

criterion".  The  program  slicer  uses  tne 
slicing  criterion  to  analyse  the  data 
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Trace  tne  effects  of  eacn  cnange  at 
tnis  level. 


effect  analysis  (pnase  3)  and  testing 
(pnase  4) . 


4.  Restructure  tne  software  system  to 
reduce  extraneous  effects. 

$.  Tentatively  maXe  tne  modification. 

6.  Cnee*  tne  correctness  of  the 

mod  if  teat  ion. 

7.  Refine  tne  modification  proposal  by 
decomposing  eacn  change  into  one  or 
more  cnange3  to  tne  next  level. 


The  effects  of  tne  cnanges  can  be 
traced  at  a  particular  level  'of 
description  by  performing  ‘ripple  effect* 
analysis  on  a  model  of  tnat  level  of 
description,  and  at  lower  levels,  using  a 
definition  of  the  interface  between  each 
level  of  description.  SREM  ( 19 ]  provides 
some  tracing  information  describing  the 
preparation  of  the  software  system 
requirements,  and  some  information  about 
tne  inter-connections  between  different 
requirements,  but  tnis  is  not  adequate 
for  our  detailed  analysis,  nor  is  it 
relevant  to  the  software  design  or  code. 

In  addition,  since  most  software 
systems  are  written  in  well-defined 
programming  languages,  we  can  define  a 
formal  model  of  a  software  system  and  a 
set  of  formal  operations  on  tnat  model. 
Tnis  means  that  a  modification  proposal 
can  be  stated  as  a  set  of  formal 
modification  operations,  which  are  to  be 
applied  to  tne  software  systan  to 
implement  that  modification  proposal.  He 
call  this  process  the  real last  ion  of  a 
modification  proposal. 

Program  modification  of  an  existing 
software  system  must  be  carried  out  by 
physically  modifying  the  software  at  the 
code  level,  either  by  correcting  the 
existing  program  code  or  by  developing 
new  segments  of  program  cods  (21]. 
Modifying  programs,  however,  is  an 
incremental  process.  He  have  developed 
techniques  to  assist  programmers  in 
modifying  only  the  relevant  portions  of 
toe  program  and  in  re-asserting  its 
correctness  with  a  minimal  amount  of 
re-analysis  of  the  program  [22]. 
Incremental  program  modification  snould 
be  conducted  interactively,  so  that 
maintenance  programmers  can  expect 
instanc  feedback  on  the  effects  of  the 
modification,  and  thus  be  able  to  make 
program  modifications  more  intelligently. 
This  approacn  is  obviously  advantageous, 
because  the  length  of  tne  f lx- and -comp lie 
cycle  is  shortened?  After  the  program 
has  been  ‘fixed*,  it  is  ready  for  ripple 


Ripple  Effect  Analys  is 

An  important  factor  contributing  to 
tne  nigh  cost  and  complexity  of  software 
maintenance  is  tnat  tne  effects  of 
program  modification  are  usually  not 
restricted  to  the  location  of  tne 
modification,  but  propogate  to  otner 
portions  of  tne  program.  This  phenomenon 
nas  been  fully  described  in  [10]  and  is 
called  the  ‘ripple  effect*  of  program 
modification.  Ripple  effect  analysis 
techniques  nave  been  developed  for 
analyzing  two  aspects  of  these  ripple 
effects,  tne  logical  or  functional  aspect 
and  the  performance  aspect.  Logical 
ripple  effect  analysis  involves  tne 
Identification  of  program  areas  which  may 
require  additional  maintenance  to  ensure 
the  logical  or  functional  consistency  of 
the  software.  Performance  ripple  effect 
analysis  involves  the  identification  of 
performance  repercussions  throughout  tne 
software  system  as  a  result  of  tne 
changes  to  one  program  area. 

We  have  made  an  extensive  study  of 
logical  ripple  effect  analysis  techniques 
[10-13].  The  phenomenon  of  logical 
ripple  effects  is  a  serious  problem  for 
maintenance  programmers  who  must  modify 
large-scale  software  systems  since  tne 
repercussions  from  their  modifications 
are  rarely  obvious.  Our  automated 
technique  to  perform  logical  ripple 
effect  analysis  is  based  on  a  model  of 
the  data  and  control  dependencies  which 
exist  in  programs.  He  extend  the  data 
flow  to  include  not  only  USED  and  DEFINED 
sets,  but  also  a  mapping  to  show  now 
variables  ace  used  to  define  other 
variables.  This  model  is  called  an 
‘error  flow*  model,  since  it  shows  the 
means  by  which  potential  errors  may 
propogate  through  a  program. 

When  a  modification  is  made  to  a 
program,  changes  occur  in  the  data  flow 
of  tne  program.  A  set  of  variables, 
known  as  the  primary  error  source  set,  is 
directly  affected  by  the  modification. 
Our  tipple  effect  tracing  algorithms  use 
the  arcs  of  the  data  flow  graph  to 
determine  where  tne  effects  of  the 
primary  error  source  set  may  reacn,  and 
hence  all  the  potential  logical  ripple 
effects  of  the  modification  are 
identified. 

logical  ripple  effect  analysis  can 
be  decomposed  into  two  stages.  The  first 
stage  is  the  information  construction 
stage,  where  both  the  Intramodule  error 
flow  model  and  the  intermodule  error  flow 
model  are  constructed.  The  second  stage 


automated  tool. . 


is  tne  error  Clow  tracing  stage.  Two 
difficulties  associated  witn  logical 
ripple  effect  analysis  are  furtner  caused 
by  recur  sion  and  dynaw  ic  al  ias  ing .  due  to 
tne  fact  tnat  logical  ripple  effect 
analys  is  is  based  on  a  static  analys  is  of 
tne  data  flow  properties  of  tne  program, 
ao tn  problems  nave  recently  been  solved 
under  certain  reasonable  assumptions 
[13J. 


We  nave  also  Initiated  tne  study  of 
performance  ripple  effect  analysis 
tecnniques  (14-16).  Since  large-scale 
software  systems  often  nave  strict 
performance  requirements,  it  is  also 
important  to  insure  that  program 
modifications  do  not  degrade  program 
performance. 

wnen  modifications  are  made  to  a 
program,  performance  ripple  effects  occur 
as  well  as  logical  ripple  effaces.  We 
nave  developed  a  model  of  tne  ways  In 
wnicn  performance  ripple  effects  may 
propogate  as  a  result  of  program 
modification.  In  tnis  model  we  identify 
attributes  of  tne  program  wnicn  affect 
its  overall  performance.  These 

attributes  are  quantifiable  measures  of 
performance.  The  most  obvious  example  of 
a  performance  attribute  is  tne  execution 
time  of  module. 

By  identifying  the  performance 
dependency  relationships  wnicn  exist 
between  performance  attributes,  we  can 
construct  a  complete  model  of  potential 
performance  ripple  effect  propogation. 
wnen  a  cnange  is  made  to  a  section  of  the 
program  code,  certain  performance 
attributes  may  be  affected.  These 
performance  attributes  may,  in  turn, 
affect  other  performance  attributes  in 
tne  program  due  to  a  performance 
dependency  relationanip.  Performance 
dependency  relationships  are  created  as  a 
result  of  certain  mechanisms  in  the 
program.  Poc  example,  calling  a  module 
is  a  mechanism  which  creates  a 
performance  dependency  relationship  from 
tne  called  module  to  tne  calling  module. 
Specifically,  this  means  tnat  the 
execution  time  attribute  of  the  called 
module  may  affect  the  execution  time  of 
tne  calling  nodule. 

In  [IS]  we  describe  a  number  of 
performance  attributes  and  tne  possible 
performance  dependency  relationships 
between  tnem.  as ing  tnis  model  we  have 
also  developed  algorithms  to  trace  the 
potential  performance  ripple  effects  from 
an  initial  modification.  These 

algorithms  are  also  presented  in  [15]. 
Yau  et  al  [23]  refined  some  of  these 
tecnniques  and  verified  the  basic 
formulation  of  tne  approacn  using  an 


E f  feet  ive  Test ing  foe  Sof  twa re 

Ma  intenance 

After  all  tne  modifications  and 
tneir  tipple  effects  nave  been 
accommodated,  testing  is  pec  formed. 
Testing  is  done  to  validate  tne  modified 
program  in  order  to  detect  unexpected 
errors  due  to  tne  modifications,  3ucn  as 
previously  dormant  errors  wnicn  may  nave 
become  active  errors  due  to  tne 
modification.  A  complete  testing 
strategy  for  tne  maintenance  pnase 
consists  of  module  testing,  integration 
testing,  and  system  function  testing,  we 
have  concentrated  on  a  module  testing 
teennique  wnicn  is  part  of  an  overall 
testing  strategy  for  software 

maintenance.  This  teennique  uses  tne 
input  part  it  ion  metnod  for  test  case 
generation  and  the  daca-dr iven  symbol ic 
evaluation  method  for  test  case  execution 

nm - 


Por  each  of  the  modified  modules, 
test  caees  are  generated  by  comparing  the 
detailed  specifications  and  tne  program 
code.  Whenever  possible,  we  will  use 
test  cases  in  tne  original  test  set  whicn 
go  through  any  modified  portion  of  tne 
program.  Sowever,  it  is  also  necessary 
to  generate  additional  test  cases.  Tnese 
test  cases  are  then  used  to  evaluate  the 
benavior  of  tne  modified  software.  Our 
approacn  is  to  use  symbolic  execution, 
driven  by  actual  test  case  data,  to 
produce  symbolic  test  results.  tn 
addition  to  test  case  generation  and  test 
case  execution,  the  technique  also 
supports  debugging  of  the  module  wnen  the 
existence  of  errors  has  been  detected. 


SOFTWARE  MAINTENANCE  ENVIRONMENT 

In  tne  following  sections  we  will 
describe  software  tools  wnicn  we  nave 
developed  for  software  maintenance. 
These  tools  have  been  demonstrated  on  a 
DSC  VAX-ll/780  computer  under  the  VMS 
operating  system. 

The  Syntax-dices  ted  Progcm  Suitor 

One  major  contribution  made  by 
syntax-directed  editors  is  that  they 
treat  a  program  as  a  well-formed 
collection  of  syntactic  units  (language 
constructs),  not  just  text. 

We  have  developed  a  syntax-directed 
editor  whicn  uses  three  classes  of 
editing  commands  bas  ic  modification 
commands,  cur  sot  movement  commands,  and 
extended  modification  commands.  The 
bas ic  modification  commands  include  ADO, 


inserta,  inserts,  de-et  e  and  change. 
Tnese  coraiaunds  jre  *bjsic*  Secau««  tney 
stovide  tne  basic  Tiechanisms  to  enable 
■maintenance  programmers  to  mod  l  fy 
programs.  Trie  cursor  movement  commands 
"include  IP  ,  OCWN ,  LEFT,  RIGHT  and 
DIAGONAL.  Maxing  use  of  tnese  cursor 
movement  commands  facilitates  ‘structural 
movement*  ratnec  t.nan  ■*  textual  movement* 
t.nrougn  tne  program.  witn  tnese 
commands,  programmers  can  make  mote 
sensible  moves  to  locate  tne  desired 
constructs.  The  extended  modification 
commands  include  C'JT,  RASTER,  PASTS, 
COPY  and  REP  LACS.  Tnese  extended 
commands  provide  furtner  editing  power 
for  tne  user. 

Details  about  tne  mecnanism  working 
benind  tneae  commands  can  be  seen  in  £9]. 
Tms  editor  operates  on  a  syntax-oriented 
program  representation  which  is  also 
fully  described  in  [91. 

An  incremental  analysis  mecnanism 
must  be  associated  witn  tne  editor  to 
evaluate  tne  static  semantics  of 

programs.  For  example,  tne  command  to 
delete  a  variable  declaration  may  trigger 
tne  invocation  of  a  semantic  checking 
routine  wnien  nignlignts  all  tne  usages 
of  tnat  variable,  to  remind  tne 
programmer  of  tne  existence  of  a 
potential  semantic  inconsistency. 

Tne  syntax-directed  editor  Is  also 
supported  by  a  screen-oriented 
pretty-pc Inter  wnieft  al lows  tne 

programmer  to  view  tne  portion  of  tne 
program  being  edited.  The  programmer 

first  uses  cursor  movement  commands  to 
examine  tne  program,  then  uses 
modification  commands  to  modify  the 

program.  The  pretty-printer  responds  to 
cursor  movements  commands  and  recognizes 
program  changes  by  examining  tne  pcograai 
representation.  It  then  rebuilds  the 
screen  display  according  to  the  change. 
As  a  result,  tne  pretty-printer  provides 
Instant  visual  feedback  to  assist  the 
programmer  to  perceive  program  changes  in 
an  Interactive  manner.  Figure  3 
illustrates  the  structural  cursor 
aovement  command#. 

Tne  Syntax-directed  Program  3 licer 

Welter's  program  sllcer  [20] 
operated  on  a  conventional  form  of  data 
flow  grapn  [2«]  (l.a.  a  directed  graph 
vnoae  nodes  represent  the  conditions  and 
easlgnaent  statements  of  the  program  and 
vnoae  edges  represent  possible  data  flow 
paths  between  them)  . 

In  an  interactive  programming 
environment,  and  an  normal  practice,  it 
la  more  uaaful  to  display  tne  text 


(including  data  declarations)  of  tne  code 
in  tne  slice.  We  .nave  developed  a 
program  slicer  wmen  meets  tnese 
requirements.  Our  program  slicer 
interactively  constructs  tne  text  of  a 
partial  programs  (or  "slice”)  wnien 
satisfies  tne  slicing  criterion.  Each 
slice  is  a  syntactically  correct  program, 
made  up  of  a  subset  of  tne  declarations 
and  statements  of  tne  original  program. 
To  acnieve  tnis,  we  nave  extended  tne 
program  representation  mentioned  in  tne 
previous  section  to  include  tne  data  flow 
information  wnien  tne  program  slicer 
needs . 

Our  current  approacn  is  baaed  on  an 
lntr amodule  program  slicer.  It  selects  a 
portion  of  a  module  (i.e.  procedure  or 
function)  according  to  tne  slicing 
criterion,  and  adds  to  it  tne 

declarations  of  objects  inside  or  outside 
tne  module  to  insu-e  tnat  it  forms  a 
syntactically  cot  program.  In  PASCAL 
tnese  objects  inc  ue  labels,  constants, 
types,  variab  ,  procedures  and 

functions.  To  «i  vee  tne  usefulness  of 
tnis  program  sll  as  a  programming  aid, 
we  nave  added  options  of  further 

applying  tne  sll  '  existing  slices  of 

a  program  -  to  ih  .n  a  more  refined 
picture  of  prog.  behavior  -  and  of 
combining  slices  (possibly  those  of 
distinct  modules)  into  more  comprehensive 
units.  The  operations  wnich  are 
available  to  combine  program  slices  sre 
UNION  and  INTERSECTION  of  program  slices. 
Figure  4  illustrates  how  our  program 
slicing  technique  works. 

The  Logical  and  Performance  Ripple  Effect 
Analyzer 

A  software  support  system,  the 
'logical  ripple  effect  analyzer*,  for 
performing  logical  ripple  effect  analysis 
on  PASCAL  programs  has  been  developed. 
This  support  systsa  consists  of  three 
subsystems i  an  Intramodule  error  flow 
analyzer,  an  intermodule  error  flow 
analyzer,  and  a  logical  ripple  effect 
identification  subsystem.  The 

Intramodule  error  flow  analyzer  vss 
developed  by  modifying  an  existing  PASCAL 
compilsr.  The  other  two  subsystems  were 
newly  developed.  These  programs  operate 
on  the  intramodule  error  flow  model  and 
the  Intermodule  error  flow  model. 

The  program  analyser  developed  to 
construct  a  "performance  ripple  effect 
model*  for  PASCAL  programs  has  been 
Implemented  by  modifying  the  same  PASCAL 
compiler.  A  program  to  trace  performance 
ripple  effects  has  also  been  written, 
which  handles  Initialization  of  the  data 
structures  in  the  program,  usee 
interaction,  and  the  tracing  algorithms 
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cnemseives.  Tne  'performance  rippLe  nave  been  described  in  (9,13,17,231. 

effect  analyzer*  consists  of  tnese  two  Altnougn  we  are  convinced  tnat  our 

programs .  tecnniques  can  benefit  maintenance 

personnel  in  a  direct  fasnion,  further 
Testing  by  Symbolic  Execution  investigation  into  various  otner  aspects 

---v—  of  software  maintenance  is  still 

Our  current  results  in  software  required.  Because  an  envlronstent  of  tnis 

testing  are  limited  to  module  testing.  kind  is  nignly  experimental  in  nature,  we 

we  nave  demonstrated  tftis  technique  for  must  pay  equal  attention  to  tool 

programs  written  in  AW  I  FORTRAN  since  construction  and  environsient 

our  implementation  makes  use  of  existing  experimentation  in  tne  future, 

tools  for  data  flow  analysis  (OAVE  (251) 

and  symbolic  execution  (ATTEST  (261),  Any  programming  environment  must  be 

wmcn  only  operate  on  FORTRAN  programs.  nignly  experimental  in  nature  (271.  The 

performance  of  eacn  tool  in  a  particular 
We  use  tne  OAVE  data-flow  analysis  environment  must  be  studied  and  altered 

system  [251  as  a  prepocesaor  to  produce  accordingly  in  order  to  acnieve  a  nignly 

tne  control  grapn  of  a  program  to  be  effective  Integrated  system.  Tne 

analyzed.  From  tnis  grapn  we  use  a  software  tools  described  above  nave  been 

program  grapn  generator,  wnicn  we  nave  demonstrated  independently  in  order  to 

developed,  to  construct  tne  program  grapn  snow  that  tneir  Implementation  was 

for  furtner  analysis.  Tne  tokens  of  tne  feasible.  Consequently ,  eacn  tool  has 

program,  wnicn  ate  produced  by  tne  OAVE  been  developed  to  operate  on  its  own 

system,  are  used  by  an  intermediate  code  program  model  (or  representation), 

generator,  wnicn  is  a  part  of  ATTEST's  altnougn  togetner  they  provide  a  wide 

preprocessor  (26),  to  construct  an  spectrtsa  of  program  analyses, 

intermediate  code  repceaentat ion  of  tne 

program.  Tnis  intermediate  code  will  be  However,  our  experience  has  shown 

used  for  symbolic  execution  of  tne  that  information  constructed  for  eacn 

program.  Independent  tool  can  also  be  shared  among 

several  tools  of  similar  nature.  For 
We  nave  also  developed  a  example,  data  flow  information  appears 

modification  nandler  to  store  botn  in  the  program  representat ion  used 

modification  information  in  the  program  by  the  program  slicer  and  the  program 

grapn  produced  by  tne  program  graph  editor,  and  in  tne  error  flow  model  used 

generator.  Tne  ATTEST  symbolic  execution  by  tne  logical  ripple  effect  analyzer, 

system  was  modified  to  permit  data  driven  The  program  representation  used  by  the 

execution,  and  tnis  modified  system  is  program  editor  implicitly  contains  the 

used  for  test  case  selection  and  test  control  flow  information  whlcn  is 

case  execution.  The  resules  of  data  essential  to  tne  module  testing  tools, 

driven  symbolic  execution  are  used  for  The  performance  analysis  tools  also 

output  validation.  require  Information  regarding  control 

flow  and  data  flow,  although  they  also 
A  test  execution  tool  was  developed  require  additional  performance  oriented 

to  perform  test  execution  interactively.  information. 

This  tool  is  used  for  debugging,  and  uses 

four  types  of  command:  test  case  Zn  order  to  Integrate  all  these 

specification  commands,  move  commands,  tools  to  form  an  effective  maintenance 

snow  commands  and  breakpoint  comsunds.  machine,  several  models  representing 

various  aspects  of  a  software  system  may 
Altnougn  it  has  been  demonstrated  exist  simultaneously.  We  view  these 

for  FORTRAN  programs,  this  module  testing  different  pieces  of  Information 

tecnnlque  can  be  adapted  to  block  collectively  as  a  portion  of  the 

structured  programming  languages  by  multi-level  software  system  description, 

altering  the  front-end  (preprocessor)  and  However,  it  is  still  necessary  to  develop 

tne  user  interface  (modification  a  mechanism,  wnereby  the  maintenance 

handler).  activities  can  be  carried  out 

harmoniously  and  efficiently.  This 
mechanism  may  be  considered  to  be  a 
AM  INTEGRATED  SOFTWARE  fSUNTENANCB  'modi  float  ion  session  manager*,  which 

~  ENTiRSfJTiNT  will  support  a  friendly  user  Interface 

and  effective  and  accurate  information 
Before  an  integrated  system  can  be  handling.  The  modification  session 

acnieved,  experimentation  must  be  manager  has  the  responsibility  of 

performed,  based  on  independent  execution  controlling  the  users'  use  of  tne 

of  eacn  tool  currently  existing  in  our  different  software  tools  in  performing 

software  maintenance  environment.  modification  activities.  Figure  3  snows 

Results  of  tnese  separate  experiments  now  such  a  system  can  be  organized. 
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CONCLUSION 

In  tms  paper  we  nave  presented  a 
comprenens  ive  software  maintenance 

met nod o logy.  All  phases  contained  in 
tms  metnodology  and  tecnniques  involved 
in  eacn  pnase  nave  been  briefly 
described.  Tne  statue  of  prototype 
systems  based  on  various  tecnniques  naa 
been  discussed.  Sased  on  tne  framework 
reported  nere ,  we  expect  to  conduct  full 
experiments  in  applying  our  metnodology 
to  a  large-scale  software  system  in  tne 
near  future. 
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Figure  1.  The  Software  Maintenance  Process 
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Flquce  l.  The  approach  to  specifying 
Modification  proposals. 
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begin 

react  (data)  ; 
wmlc  (not  eof(data))  do 
begin 

wnile  (not  eoln(data))  do 
•1  *2 


read  (data.cn)  ; 

•4 


case  ord  (cn)  of 


<f  s  begin 

write  (cn)  ; 
read  (carr) 
end  i 

cr  s  write  (cn)  ; 

If  :  wrieeln 
otherwise  write  (cn) 
end 
end  ; 

readln  (data) 
end 

end. 


position  #1  to  position  *2  -  RIGHT 

•1  *3  -  DOWN 

*3  *4  -  RIGHT 

•4  *5  -  DOWN 


prograa  triangle  (input,  output)  ; 

1  This  prograa  builds  a  digit  triangle  } 

var  i.j.k  :  integer; 

begin 

for  j  ;■  1  to  9  do 
begin 

is-  1; 

for  i  :•  1  to  j  do 
write(i:I7; 

for  k  j  down  to  2  do 
write  (k-T7f)  ; 
writeln; 

end 

end  {  triangle  }  . 

(a) 


prograa  triangle  (input,  output)  ; 

var  j.k  t  integer; 

begin 

for^i  1  to  9  do 

^  toe  k  j  down to  2  do 

end 

end. 


(b) 

figure  4  (a)  me  prograa  to  be  sliced. 

(b)  An  illustration  of  the  syntax-directed 
prograa  slicing  technique  (slicing  for 
variable  k) . 


figure  3.  Structural  cursor  aoveaents 
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